Re: [HACKERS] Custom compression methods

Andres Freund Tue, 16 Mar 2021 11:55:35 -0700

Hi,

On 2021-03-16 10:27:12 -0400, Robert Haas wrote:
> > I'm don't like that after 0002 ExecEvalRow(), ExecEvalFieldStoreForm()
> > contain a nearly identical copy of the same code.  And
> > make_tuple_from_row() also is similar. It seem that there should be a
> > heap_form_tuple() version doing this for us?
> 
> I was worried about having either a performance impact or code
> duplication. The actual plan where you could insert this organically
> is in fill_val(), which is called from heap_fill_tuple(), which is
> called from heap_form_tuple().


Oh, I guess it would make sense to do it that way. However, I was just
thinking of doing the iteration over the tuples that ExecEvalRow() etc
do inside heap_form_flattened_tuple() (or whatever). That'd not be any
worse than what the patch is doing now, just less duplication, and an
easier path towards optimizing it if we notice that we need to?


> If you don't mind passing down 'int flags' or similar to all those,
> and having additional branches to make the behavior dependent on the
> flags, I'm cool with it. Or if you think we should template-ize all
> those functions, that'd be another way to go. But I was afraid I would
> get complaints about adding overhead to hot code paths.

An option for fill_val() itself would probably be fine. It's already an
inline, and if it doesn't get inlined, we could force the compilers hand
with pg_attribute_always_inline.

The harder part would probably be to find a way to deal with the layers
above, without undue code duplication. I think it's not just fill_val()
that'd need to know, but also heap_compute_data_size(),
heap_fill_tuple() - both of which are externally visible (and iirc thus
not going to get inlined with many compiler options, due to symbol
interposition dangers). But we could have a
heap_compute_data_size_internal(bool flatten) that's called by
heap_compute_data_size(). And something similar for heap_form_tuple().

But that's complicated, so I'd just go with the iteration in a
heap_form_tuple() wrapper for now.


> > > I'm open to being convinced that we don't need to do either of these
> > > things, and that the cost of iterating over all varlenas in the tuple
> > > is not so bad as to preclude doing things as you have them here. But,
> > > I'm afraid it's going to be too expensive.
> >
> > I mean, I would just define several of those places away by not caring
> > about tuples in a different compressino formation ending up in a
> > table...
> 
> That behavior feels unacceptable to me from a user expectations point
> of view. I think there's an argument that if I update a tuple that
> contains a compressed datum, and I don't update that particular
> column, I think it would be OK to not recompress the column. But, if I
> insert data into a table, I as a user would expect that the
> compression settings for that column are going to be respected.

IDK. The user might also expect that INSERT .. SELECT is fast, instead
of doing expensive decompression + compression (with pglz the former can
be really slow). I think there's a good argument for having an explicit
"recompress" operation, but I'm not convincd that doing things
implicitly is good, especially if it causes complications in quite a few
places.

Greetings,

Andres Freund

Re: [HACKERS] Custom compression methods

Reply via email to