On Tue, Mar 16, 2021 at 2:54 PM Andres Freund <and...@anarazel.de> wrote: > Oh, I guess it would make sense to do it that way. However, I was just > thinking of doing the iteration over the tuples that ExecEvalRow() etc > do inside heap_form_flattened_tuple() (or whatever). That'd not be any > worse than what the patch is doing now, just less duplication, and an > easier path towards optimizing it if we notice that we need to?
It's a question of whether you copy the datum array. I don't think a generic function can assume that it's OK to scribble on the input array, or if it does, that'd better be very prominently mentioned in the comments. And copying into a new array has its own costs. 0002 is based on the theory that scribbling on the executor's array won't cause any problem, which I *think* is true, but isn't correct in all cases (e.g. if the input data is coming from a slot). If we pass a flag down to fill_val() and friends then we don't end up having to copy the arrays over so the problem goes away in that design. > The harder part would probably be to find a way to deal with the layers > above, without undue code duplication. I think it's not just fill_val() > that'd need to know, but also heap_compute_data_size(), > heap_fill_tuple() - both of which are externally visible (and iirc thus > not going to get inlined with many compiler options, due to symbol > interposition dangers). But we could have a > heap_compute_data_size_internal(bool flatten) that's called by > heap_compute_data_size(). And something similar for heap_form_tuple(). Hmm, yeah, that's not great. I guess there's nothing expensive we need to repeat - I think anyway - because we should be able to get the uncompressed size from the TOAST pointer itself. But the code would have to know to do that, as you say. -- Robert Haas EDB: http://www.enterprisedb.com