> If we add something for NULLs, we'll always add distinctClause before > the sortClause. If the distinctClause is NULL we won't end up with the > same jumble bytes as if the sortClause were NULL, as the order the > NULL byte(s) are added to the buffer changes.
That is how I understand it as well. By appending a single character null terminator to the jumble for every NULL expression, we change the composition of the final jumble. Passing offsets will accomplish the same thing, but I can't see how it buys us any additional safeguards. > I suspect that the overhead will be minimal for all the approaches I'm > seeing on this thread, but it would not hurt to double-check all that. Perhaps my concern is overblown. Also, it seems the consensus is for a more comprehensive solution than that of variant A. Here is an idea I was playing around with. Why don't we just append a null terminator for every expression (a variant of variant B)? I describe this as jumbling the expression position. And if we do that, it no longer seems necessary to jumble the expression type either. right? +++ b/src/backend/nodes/queryjumblefuncs.c @@ -236,6 +236,13 @@ do { \ if (expr->str) \ AppendJumble(jstate, (const unsigned char *) (expr->str), strlen(expr->str) + 1); \ } while(0) +#define JUMBLE_POSITION() \ +do { \ + char nullterm = '\0'; \ + AppendJumble(jstate, (const unsigned char *) &(nullterm), sizeof(nullterm)); \ + if (expr == NULL) \ + return; \ +} while(0) #include "queryjumblefuncs.funcs.c" @@ -244,17 +251,15 @@ _jumbleNode(JumbleState *jstate, Node *node) { Node *expr = node; - if (expr == NULL) - return; - /* Guard against stack overflow due to overly complex expressions */ check_stack_depth(); /* - * We always emit the node's NodeTag, then any additional fields that are - * considered significant, and then we recurse to any child nodes. + * We represent a position of a field in the jumble with a null-term. + * Doing so ensures that even NULL expressions are accounted for in + * the jumble. */ - JUMBLE_FIELD(type); + JUMBLE_POSITION(); switch (nodeTag(expr)) { > As the overhead of a single query jumbling is weightless compared to > the overall query processing yeah, that is a good point. At least benchmarking the above on a SELECT only pgbench did not reveal any regression. -- Sami Imseih Amazon Web Services (AWS)