Hello everyone,

I am participating in gsoc and would like to share the current working status 
of my project,
devoted to an alternative data structure for buffer management.

To get things going, I started with the basic implementation of the art tree 
for the single
backend process, using the open-source realization of the algorithm. That was
a pretty straightforward task that helped me to better understand
data structure and convenient mechanism of PostgreSQL for local memory 
allocation.
For the validation purposes, I also ported a couple of tests as a simple 
extension with a function,
that can read a file with test data and run basic operations on the tree: 
insert/search/delete.
Described changes are present in the first two commits [0].

It is worthwhile to note that this "src/backend/lib/artree.c" implementation 
will not go any further
and will be thrown away for the final patch. The reason for that is an example 
with the current dynahash
implementation(that I have examined later, while was trying to move the tree 
into shared memory),
that supports both access patterns with shared & local memory.
Thus, I guess that the proper place for the tree data structure is nearby 
dynahash, i.e. in "src/backend/utils/tree/..."
and eventually, with some restrictions for the shared variant (fixed key size 
length), it can be used in some other places
of the system.

The last two commits implement the simplest form of the buf-tree in the shared 
memory, alongside the existing hashtable.
SharedBufTree just repeats all the actions that are performed on the hashtable 
and compares results.
For now, it completely relies on the synchronization, that is performed at the 
layer above for the hashtable,
i.e. BufferAlloc(...). Also, the size of memory that is used for tree's 
nodes/leaves depends on just some random numbers.
(working with default 128mb shared memory size)
Operations that work in a bulky way with buffers (drop/truncate/checkpoint/...) 
are not touched.

So, this is a plan, that I would like to stick with subsequently:

1) Work on synchronization.
2) Examine bulk operations on buffers, replace them with tree (prefix) iterator.
3) A reasonable way to size memory for tree maintenance.

Ideas & suggestions & comments are welcomed.

Dynamic data structure like tree brings problems that are not relevant for the 
current hashtable implementation.
The number of LWLocks in the hashtable is fixed and equal to the number of 
partitions(128),
the memory footprint for buckets and hash-entries can be easily precomputed.

Currently, I am thinking about the idea of tree decomposition, so that it is 
really a
tree of trees, where RelFileNode[12bytes] (maybe + ForkNum) can be used for the 
top tree and
the rest for the trees of blocks. Hence, each bottom tree can contain a single 
private LWLock, that will be
used to protect its content. Won't be that an overkill? if we take a 
hypothetical example with
2 tablespaces(hdd + sdd), each with 500+ objects, so the number of LWLocks 
should scale accordingly, or
we will be forced to reuse them, by unloading block trees, that should be done 
in some intelligent way.

Another approach is to use lock coupling... with the penalty of cache 
invalidation.

-- 
[0] https://github.com/mvpant/postgres/commits/artbufmgr



Reply via email to