Re: [HACKERS] Custom compression methods

Robert Haas Tue, 16 Mar 2021 12:31:31 -0700

On Tue, Mar 16, 2021 at 2:54 PM Andres Freund <and...@anarazel.de> wrote:
> Oh, I guess it would make sense to do it that way. However, I was just
> thinking of doing the iteration over the tuples that ExecEvalRow() etc
> do inside heap_form_flattened_tuple() (or whatever). That'd not be any
> worse than what the patch is doing now, just less duplication, and an
> easier path towards optimizing it if we notice that we need to?


It's a question of whether you copy the datum array. I don't think a
generic function can assume that it's OK to scribble on the input
array, or if it does, that'd better be very prominently mentioned in
the comments. And copying into a new array has its own costs. 0002 is
based on the theory that scribbling on the executor's array won't
cause any problem, which I *think* is true, but isn't correct in all
cases (e.g. if the input data is coming from a slot). If we pass a
flag down to fill_val() and friends then we don't end up having to
copy the arrays over so the problem goes away in that design.

> The harder part would probably be to find a way to deal with the layers
> above, without undue code duplication. I think it's not just fill_val()
> that'd need to know, but also heap_compute_data_size(),
> heap_fill_tuple() - both of which are externally visible (and iirc thus
> not going to get inlined with many compiler options, due to symbol
> interposition dangers). But we could have a
> heap_compute_data_size_internal(bool flatten) that's called by
> heap_compute_data_size(). And something similar for heap_form_tuple().

Hmm, yeah, that's not great. I guess there's nothing expensive we need
to repeat - I think anyway - because we should be able to get the
uncompressed size from the TOAST pointer itself. But the code would
have to know to do that, as you say.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [HACKERS] Custom compression methods

Reply via email to