Andres asked me off-list for comments on 0026, so here goes. As a general comment, I think the patches could really benefit from more meaningful commit messages and more comments on individual functions. It would definitely help me review, and it might help other people review, or modify the code later. For example, I'm looking at ExprEvalStep. If the intent here is that we don't want the union members to point to data that might differ from one execution of the plan to the next, it's surely important to mention that and explain to people who are trying to add steps later what they should do instead. But I'm also not entirely sure that's the intended rule. It kind of surprises me that the only things that we'd be pointing to here that would fall into that category would be a bool, a NullableDatum, a NullableDatum array, and a FunctionCallInfo ... but I've been surprised by a lot of things that turned out to be true.
I am not a huge fan of the various Rel[Whatever] typedefs. I am not sure that's really adding any clarity. On the other hand I would be a big fan of renaming the structure members in some systematic way. This kind of thing doesn't sit well with me: - NullableDatum *value; /* value to return */ + RelNullableDatum value; /* value to return */ Well, if NullableDatum was the value to return, then RelNullableDatum isn't. It's some kind of thing that lets you find the value to return. Actually that's not really right either, because before 'value' was a pointer to the value to return and the corresponding isnull flag, and now it is a way of finding that stuff. I don't know exactly what to do here to keep the comment comprehensible and not unreasonably long, but I don't think not changing at it all is the thing. Nor do I think just having it be called 'value' when it's clearly not the value, nor even a pointer to the value, is as clear as I would like to be. I wonder if ExprBuilderAllocBool ought to be using sizeof(bool) rather than sizeof(NullableDatum). Is it true that allocno is only used for, err, some kind of LLVM thing, and not in the regular interpreted path? As far as I can see, outside of the LLVM code, we only ever test whether it's 0, and don't actually care about the specific value. I hope that the fact that this patch reverses the order of the first two arguments to ExecInitExprRec is only something you did to make it so that the compiler would find places you needed to update. Because otherwise it makes no sense to introduce a new thing called an ExprStateBuilder in 0017, make it an argument to that function, and then turn around and change the signature again in 0026. Anyway, a final patch shouldn't include this kind of churn. + offsetof(ExprState, steps) * esb->steps_len * sizeof(ExprEvalStep) + + state->mutable_off = offsetof(ExprState, steps) * esb->steps_len * sizeof(ExprEvalStep); Well, either I'm confused here, or the first * should be a + in each case. I wonder how this works at all. + /* copy in step data */ + { + ListCell *lc; + int off = 0; + + foreach(lc, esb->steps) + { + memcpy(&state->steps[off], lfirst(lc), sizeof(ExprEvalStep)); + off++; + } + } This seems incredibly pointless to me. Why use a List in the first place if we're going to have to flatten it using this kind of code? I think stuff like RelFCIOff() and RelFCIIdx() and RelArrayIdx() is just pretty much incomprehensible. Now, the executor is full of badly-named stuff already -- ExecInitExprRec being a fine example of a name nobody is going to understand on first reading, or maybe ever -- but we ought to try not to make things worse. I also do understand that anything with relative pointers is bound to involve a bunch of crappy notation that we're just going to have to accept as the price of doing business. But it would help to pick names that are not so heavily abbreviated. Like, if RelFCIIdx() were called find_function_argument_in_relative_fcinfo() or even get_fnarg_from_relfcinfo() the casual reader might have a chance of guessing what it does. Sure, the code might be longer, but if you can tell what it does without cross-referencing, it's still better. I would welcome changes that make it clearer which things happen just once and which things happen at execution time; that said, it seems like RELPTR_RESOLVE() happens at execution time, and it surprises me a bit that this is OK from a performance perspective. The pointers can change from execution to execution, but not within an individual execution, or so I think. So it doesn't need to be resolved every time, if somehow that can be avoided. But maybe CPUs are sufficiently well-optimized for computing a pointer address as a+b*c that it does not matter. I'm not sure how helpful any of these comments are, but those are my initial thoughts. -- Robert Haas EDB: http://www.enterprisedb.com