On Wed, Dec 6, 2023 at 4:34 AM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > On Mon, Dec 4, 2023 at 5:21 PM John Naylor <johncnaylo...@gmail.com> wrote:
> > > Given variable-length value support, RT_GET() would have to do > > > repalloc() if the existing value size is not big enough for the new > > > value, but it cannot as the radix tree doesn't know the size of each > > > stored value. > > > > I think we have two choices: > > > > - the value stores the "length". The caller would need to specify a > > function to compute size from the "length" member. Note this assumes > > there is an array. I think both aspects are not great. > > - the value stores the "size". Callers that store an array (as > > PageTableEntry's do) would compute length when they need to. This > > sounds easier. > > As for the second idea, do we always need to require the value to have > the "size" (e.g. int32) in the first field of its struct? If so, the > caller will be able to use only 4 bytes in embedded value cases (or > won't be able to use at all if the pointer size is 4 bytes). We could have an RT_SIZE_TYPE for varlen value types. That's easy. There is another way, though: (This is a digression into embedded values, but it does illuminate some issues even aside from that) My thinking a while ago was that an embedded value had no explicit length/size, but could be "expanded" into a conventional value for the caller. For bitmaps, the smallest full value would have length 1 and whatever size (For tid store maybe 16 bytes). This would happen automatically via a template function. Now I think that could be too complicated (especially for page table entries, which have more bookkeeping than vacuum needs) and slow. Imagine this as an embedded value: typedef struct BlocktableEntry { uint16 size; /* later: uint8 flags; for bitmap scan */ /* 64 bit: 3 elements , 32-bit: 1 element */ OffsetNumber offsets[( sizeof(Pointer) - sizeof(int16) ) / sizeof(OffsetNumber)]; /* end of embeddable value */ bitmapword words[FLEXIBLE_ARRAY_MEMBER]; } BlocktableEntry; Here we can use a slot to store up to 3 offsets, no matter how big they are. That's great because a bitmap could be mostly wasted space. But now the caller can't know up front how many bytes it needs until it retrieves the value and sees what's already there. If there are already three values, the caller needs to tell the tree "alloc this much, update this slot you just gave me with the alloc (maybe DSA) pointer, and return the local pointer". Then copy the 3 offsets into set bits, and set whatever else it needs to. With normal values, same thing, but with realloc. This is a bit complex, but I see an advantage The tree doesn't need to care so much about the size, so the value doesn't need to contain the size. For our case, we can use length (number of bitmapwords) without the disadvantages I mentioned above, with length zero (or maybe -1) meaning "no bitmapword array, the offsets are all in this small array". > > > Another idea is that the radix tree returns the pointer > > > to the slot and the caller updates the value accordingly. > > > > I did exactly this in v43 TidStore if I understood you correctly. If I > > misunderstood you, can you clarify? > > I meant to expose RT_GET_SLOT_RECURSIVE() so that the caller updates > the value as they want. Did my sketch above get closer to that? Side note: I don't think we can expose that directly (e.g. need to check for create or extend upwards), but some functionality can be a thin wrapper around it. > > However, for vacuum, we have all values that we need up front. That > > gives me an idea: Something like this insert API could be optimized > > for "insert-only": If we only free values when we free the whole tree > > at the end, that's a clear use case for David Rowley's proposed "bump > > context", which would save 8 bytes per allocation and be a bit faster. > > [1] (RT_GET for varlen values would use an aset context, to allow > > repalloc, and nodes would continue to use slab). > > Interesting idea and worth trying it. Do we need to protect the whole > tree as insert-only for safety? It's problematic if the user uses > mixed RT_INSERT() and RT_GET(). You're right, but I'm not sure what the policy should be.