DuckDB--Buffer-Manangement
DuckDB的Buffer Management的设计与实现参考了LeanStore[1],通过实现Pointer Swizzling
来尽可能的兼顾in-memory database的性能和更强的处理out-of-core场景的能力。
在DuckDB的Buffer Manangement中,Block
是缓存单元,类似传统Buffer Manangement中的Page
。DuckDB中有两种Block,硬盘block(unswizzle block
)和内存block(swizzled block
)。对于block id小于MAXIMUM_BLOCK
[其值等于2^62]的block就代表磁盘中的block,反之,如果block id大于等于MAXIMUM_BLOCK
就是内存中的block,具体可以参考BlockManager::UnregisterBlock
void BlockManager::UnregisterBlock(block_id_t block_id, bool can_destroy) {
if (block_id >= MAXIMUM_BLOCK) {
// in-memory buffer: destroy the buffer
if (!can_destroy) {
// buffer could have been offloaded to disk
// remove the file
buffer_manager.DeleteTemporaryFile(block_id);
}
} else {
lock_guard<mutex> lock(blocks_lock);
// on-disk block: erase from list of blocks in manager
blocks.erase(block_id);
}
}
RegisterMemory(idx_t block_size, bool can_destroy)
方法会在内存中创建并注册一个新的Block。这里不需要block id,而是接收两个参数,一个block_size
用于指定分配的block的大小,另一个can_destroy
表示这个block是否可以在暂时不用时直接释放,也就是说如果can_destroy
是true,那这个产生的block就会在使用之后立刻释放。
shared_ptr<BlockHandle> BufferManager::RegisterMemory(
idx_t block_size, bool can_destroy) {
D_ASSERT(block_size >= Storage::BLOCK_SIZE);
auto alloc_size = GetAllocSize(block_size);
// first evict blocks until we have enough memory to store this buffer
unique_ptr<FileBuffer> reusable_buffer;
auto res = EvictBlocksOrThrow(
alloc_size, maximum_memory, &reusable_buffer,
"could not allocate block of %lld bytes (%lld/%lld used) %s",
alloc_size, GetUsedMemory(), GetMaxMemory()
);
auto buffer = ConstructManagedBuffer(block_size, move(reusable_buffer));
// create a new block pointer for this block
return make_shared<BlockHandle>(
*temp_block_manager,
++temporary_id,
move(buffer), can_destroy,
alloc_size, move(res)
);
}
RegisterMemory
函数的具体实现的逻辑并不复杂。首先是检查内存限制,如果新增这块内存会导致内存超限,则首先驱逐掉暂时不用的blockEvictBlocksOrThrow
。然后申请缓存空间,通过ConstructManagedBuffer
建立FileBuffer
。 FileBuffer
是数据真实存放的地方,在初始化时会在内存中开辟所需的block_size的空间,以便执行器直接使用,而在被落盘后,提供了面向磁盘进行读写的接口。最后,将这个ManagedBuffer移入BlockHandle
中,交由BlockHandle
进行管理。在最后创建BlockHandle
的时候,使用了原子变量temporary_id
,它默认初始化的值就是MAXIMUM_BLOCK
[具体参考下面的BufferManager
的构造函数],然后每次++temporary_id
的时候就能保证新注册的block_id永远大于MAXIMUM_BLOCK
,也就是说新的block是在内存中的。
RegisterBlock(block_id_t block_id)
方法并不会分配新的block id,而是传入一个已经在磁盘上的block的block id,表示让缓存管理器来管理这个block。
shared_ptr<BlockHandle> BlockManager::RegisterBlock(
block_id_t block_id, bool is_meta_block) {
lock_guard<mutex> lock(blocks_lock);
// check if the block already exists
auto entry = blocks.find(block_id);
if (entry != blocks.end()) {
// already exists: check if it hasn't expired yet
auto existing_ptr = entry->second.lock();
if (existing_ptr) {
//! it hasn't! return it
return existing_ptr;
}
}
// create a new block pointer for this block
auto result = make_shared<BlockHandle>(*this, block_id);
// for meta block, cache the handle in meta_blocks
if (is_meta_block) {
meta_blocks[block_id] = result;
}
// register the block pointer in the set of blocks as a weak pointer
blocks[block_id] = weak_ptr<BlockHandle>(result);
return result;
}
Reference:
- Viktor Leis, Michael Haubenschild, Alfons Kemper, and Thomas Neumann. “LeanStore: In-memory data management beyond main memory.”, IEEE International Conference on Data Engineering (ICDE), pp. 185-196, 2018.