On Wed, Oct 26, 2022 at 8:06 PM John Naylor <john.nay...@enterprisedb.com> wrote: > > On Mon, Oct 24, 2022 at 12:54 PM Masahiko Sawada <sawada.m...@gmail.com> > wrote: > > > I've attached updated PoC patches for discussion and cfbot. From the > > previous version, I mainly changed the following things: > >
Thank you for the comments! > > * Separate treatment of inner and leaf nodes > > Overall, this looks much better! > > > * Pack both the node kind and node count to an uint16 value. > > For this, I did mention a bitfield earlier as something we "could" do, but it > wasn't clear we should. After looking again at the node types, I must not > have thought through this at all. Storing one byte instead of four for the > full enum is a good step, but saving one more byte usually doesn't buy > anything because of padding, with a few exceptions like this example: > > node4: 4 + 4 + 4*8 = 40 > node4: 5 + 4+(7) + 4*8 = 48 bytes > > Even there, I'd rather not spend the extra cycles to access the members. And > with my idea of decoupling size classes from kind, the variable-sized kinds > will require another byte to store "capacity". Then, even if the kind gets > encoded in a pointer tag, we'll still have 5 bytes in the base type. So I > think we should assume 5 bytes from the start. (Might be 6 temporarily if I > work on size decoupling first). True. I'm going to start with 6 bytes and will consider reducing it to 5 bytes. Encoding the kind in a pointer tag could be tricky given DSA support so currently I'm thinking to pack the node kind and node capacity classes to uint8. > > (Side note, if you have occasion to use bitfields again in the future, C99 > has syntactic support for them, so no need to write your own shifting/masking > code). Thanks! > > > I've not done SIMD part seriously yet. But overall the performance > > seems good so far. If we agree with the current approach, I think we > > can proceed with the verification of decoupling node sizes from node > > kind. And I'll investigate DSA support. > > Sounds good. I have some additional comments about v7, and after these are > addressed, we can proceed independently with the above two items. Seeing the > DSA work will also inform me how invasive pointer tagging will be. There will > still be some performance tuning and cosmetic work, but it's getting closer. > I've made some progress on investigating DSA support. I've written draft patch for that and regression tests passed. I'll share it as a separate patch for discussion with v8 radix tree patch. While implementing DSA support, I realized that we may not need to use pointer tagging to distinguish between backend-local address or dsa_pointer. In order to get a backend-local address from dsa_pointer, we need to pass dsa_area like: node = dsa_get_address(tree->dsa, node_dp); As shown above, the dsa area used by the shared radix tree is stored in radix_tree struct, so we can know whether the radix tree is shared or not by checking (tree->dsa == NULL). That is, if it's shared we use a pointer to radix tree node as dsa_pointer, and if not we use a pointer as a backend-local pointer. We don't need to encode something in a pointer. > ------------------------- > 0001: > > +#ifndef USE_NO_SIMD > +#include "port/pg_bitutils.h" > +#endif > > Leftover from an earlier version? > > +static inline int vector8_find(const Vector8 v, const uint8 c); > +static inline int vector8_find_ge(const Vector8 v, const uint8 c); > > Leftovers, causing compiler warnings. (Also see new variable shadow warning) Will fix. > > +#else /* USE_NO_SIMD */ > + Vector8 r = 0; > + uint8 *rp = (uint8 *) &r; > + > + for (Size i = 0; i < sizeof(Vector8); i++) > + rp[i] = Min(((const uint8 *) &v1)[i], ((const uint8 *) &v2)[i]); > + > + return r; > +#endif > > As I mentioned a couple versions ago, this style is really awkward, and > potential non-SIMD callers will be better off writing their own byte-wise > loop rather than using this API. Especially since the "min" function exists > only as a workaround for lack of unsigned comparison in (at least) SSE2. > There is one existing function in this file with that idiom for non-assert > code (for completeness), but even there, inputs of current interest to us use > the uint64 algorithm. Agreed. Will remove non-SIMD code. > > 0002: > > + /* XXX: should not to use vector8_highbit_mask */ > + bitfield = vector8_highbit_mask(cmp1) | (vector8_highbit_mask(cmp2) << > sizeof(Vector8)); > > Hmm? It's my outdated memo, will remove. > > +/* > + * Return index of the first element in chunks in the given node that is > greater > + * than or equal to 'key'. Return -1 if there is no such element. > + */ > +static inline int > +node_32_search_ge(rt_node_base_32 *node, uint8 chunk) > > The caller must now have logic for inserting at the end: > > + int insertpos = node_32_search_ge((rt_node_base_32 *) n32, chunk); > + int16 count = NODE_GET_COUNT(n32); > + > + if (insertpos < 0) > + insertpos = count; /* insert to the tail */ > > It would be a bit more clear if node_*_search_ge() always returns the > position we need (see the prototype for example). In fact, these functions > are probably better named node*_get_insertpos(). Agreed. > > + if (likely(NODE_HAS_FREE_SLOT(n128))) > + { > + node_inner_128_insert(n128, chunk, child); > + break; > + } > + > + /* grow node from 128 to 256 */ > > We want all the node-growing code to be pushed down to the bottom so that all > branches of the hot path are close together. This provides better locality > for the CPU frontend. Looking at the assembly, the above doesn't have the > desired effect, so we need to write like this (also see prototype): > > if (unlikely( ! has-free-slot)) > grow-node; > else > { > ...; > break; > } > /* FALLTHROUGH */ Good point. Will change. > > + /* Descend the tree until a leaf node */ > + while (shift >= 0) > + { > + rt_node *child; > + > + if (NODE_IS_LEAF(node)) > + break; > + > + if (!rt_node_search_inner(node, key, RT_ACTION_FIND, &child)) > + child = rt_node_add_new_child(tree, parent, node, key); > + > + Assert(child); > + > + parent = node; > + node = child; > + shift -= RT_NODE_SPAN; > + } > > Note that if we have to call rt_node_add_new_child(), each successive loop > iteration must search it and find nothing there (the prototype had a separate > function to handle this). Maybe it's not that critical yet, but something to > keep in mind as we proceed. Maybe a comment about it to remind us. Agreed. Currently rt_extend() is used to add upper nodes but probably we need another function to add lower nodes for this case. > > + /* there is no key to delete */ > + if (!rt_node_search_leaf(node, key, RT_ACTION_FIND, NULL)) > + return false; > + > + /* Update the statistics */ > + tree->num_keys--; > + > + /* > + * Delete the key from the leaf node and recursively delete the key in > + * inner nodes if necessary. > + */ > + Assert(NODE_IS_LEAF(stack[level])); > + while (level >= 0) > + { > + rt_node *node = stack[level--]; > + > + if (NODE_IS_LEAF(node)) > + rt_node_search_leaf(node, key, RT_ACTION_DELETE, NULL); > + else > + rt_node_search_inner(node, key, RT_ACTION_DELETE, NULL); > + > + /* If the node didn't become empty, we stop deleting the key */ > + if (!NODE_IS_EMPTY(node)) > + break; > + > + /* The node became empty */ > + rt_free_node(tree, node); > + } > > Here we call rt_node_search_leaf() twice -- once to check for existence, and > once to delete. All three search calls are inlined, so this wastes space. > Let's try to delete the leaf, return if not found, otherwise handle the leaf > bookkeepping and loop over the inner nodes. This might require some > duplication of code. Agreed. > > +ndoe_inner_128_update(rt_node_inner_128 *node, uint8 chunk, rt_node *child) > > Spelling WIll fix. > > +static inline void > +chunk_children_array_copy(uint8 *src_chunks, rt_node **src_children, > + uint8 *dst_chunks, rt_node **dst_children, int count) > +{ > + memcpy(dst_chunks, src_chunks, sizeof(uint8) * count); > + memcpy(dst_children, src_children, sizeof(rt_node *) * count); > +} > > gcc generates better code with something like this (but not hard-coded) at > the top: > > if (count > 4) > pg_unreachable(); Agreed. > > This would have to change when we implement shrinking of nodes, but might > still be useful. > > + if (!rt_node_search_leaf(node, key, RT_ACTION_FIND, value_p)) > + return false; > + > + return true; > > Maybe just "return rt_node_search_leaf(...)" ? Agreed. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com