Hi, On Tue, May 23, 2023 at 7:17 PM John Naylor <john.nay...@enterprisedb.com> wrote: > > I wrote: > > the current insert/delete paths are quite complex. Using bitmap heap scan > > as a motivating use case, I hope to refocus complexity to where it's most > > needed, and aggressively simplify where possible. > > Sometime in the not-too-distant future, I will start a new thread focusing on > bitmap heap scan, but for now, I just want to share some progress on making > the radix tree usable not only for that, but hopefully a wider range of > applications, while making the code simpler and the binary smaller. The > attached patches are incomplete (e.g. no iteration) and quite a bit messy, so > tar'd and gzip'd for the curious (should apply on top of v32 0001-03 + > 0007-09 ). >
Thank you for making progress on this. I agree with these directions overall. I have some comments and questions: > - With the latter in mind, searching the child within a node now returns the > address of the slot. This allows the same interface whether the slot contains > a child pointer or a value. Probably we can apply similar changes to the iteration as well. > * Other nodes will follow in due time, but only after I figure out how to do > it nicely (ideas welcome!) -- currently node32's two size classes work fine > for growing, but the code should be simplified before extending to other > cases.) Within the size class, we just alloc a new node of lower size class and do memcpy(). I guess it will be almost same as what we do for growing. It might be a good idea to support node shrinking within the size class for node32 (and node125 if we support). I don't think shrinking class-3 to class-1 makes sense. > > Limited support for "combined pointer-value slots". At compile-time, choose > either that or "single-value leaves" based on the size of the value type > template parameter. Values that are pointer-sized or less can fit in the > last-level child slots of nominal "inner nodes" without duplicated leaf-node > code. Node256 now must act like the previous 'node256 leaf', since zero is a > valid value. Aside from that, this was a small change. Yes, but it also means that we use pointer-sized value anyway even if the value size is less than that, which wastes the memory, no? > > What I've shared here could work (in principal, since it uses uint64 values) > for tidstore, possibly faster (untested) because of better code density, but > as mentioned I want to shoot for higher. For tidbitmap.c, I want to extend > this idea and branch at run-time on a per-value basis, so that a page-table > entry that fits in a pointer can go there, and if not, it'll be a full leaf. > (This technique enables more flexibility in lossifying pages as well.) > Run-time info will require e.g. an additional bit per slot. Since the node > header is now 3 bytes, we can spare one more byte in the node3 case. In > addition, we can and should also bump it back up to node4, still keeping the > metadata within 8 bytes (no struct padding). Sounds good. > I've started in this patchset to refer to the node kinds as "4/16/48/256", > regardless of their actual fanout. This is for readability (by matching the > language in the paper) and maintainability (should *not* ever change again). > The size classes (including multiple classes per kind) could be determined by > macros and #ifdef's. For example, in non-SIMD architectures, it's likely slow > to search an array of 32 key chunks, so in that case the compiler should > choose size classes similar to these four nominal kinds. If we want to use the node kinds used in the paper, I think we should change the number in RT_NODE_KIND_X too. Otherwise, it would be confusing when reading the code without referring to the paper. Particularly, this part is very confusing: case RT_NODE_KIND_3: RT_ADD_CHILD_4(tree, ref, node, chunk, child); break; case RT_NODE_KIND_32: RT_ADD_CHILD_16(tree, ref, node, chunk, child); break; case RT_NODE_KIND_125: RT_ADD_CHILD_48(tree, ref, node, chunk, child); break; case RT_NODE_KIND_256: RT_ADD_CHILD_256(tree, ref, node, chunk, child); break; Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com