Hi, On 2021-03-16 10:27:12 -0400, Robert Haas wrote: > > I'm don't like that after 0002 ExecEvalRow(), ExecEvalFieldStoreForm() > > contain a nearly identical copy of the same code. And > > make_tuple_from_row() also is similar. It seem that there should be a > > heap_form_tuple() version doing this for us? > > I was worried about having either a performance impact or code > duplication. The actual plan where you could insert this organically > is in fill_val(), which is called from heap_fill_tuple(), which is > called from heap_form_tuple().
Oh, I guess it would make sense to do it that way. However, I was just thinking of doing the iteration over the tuples that ExecEvalRow() etc do inside heap_form_flattened_tuple() (or whatever). That'd not be any worse than what the patch is doing now, just less duplication, and an easier path towards optimizing it if we notice that we need to? > If you don't mind passing down 'int flags' or similar to all those, > and having additional branches to make the behavior dependent on the > flags, I'm cool with it. Or if you think we should template-ize all > those functions, that'd be another way to go. But I was afraid I would > get complaints about adding overhead to hot code paths. An option for fill_val() itself would probably be fine. It's already an inline, and if it doesn't get inlined, we could force the compilers hand with pg_attribute_always_inline. The harder part would probably be to find a way to deal with the layers above, without undue code duplication. I think it's not just fill_val() that'd need to know, but also heap_compute_data_size(), heap_fill_tuple() - both of which are externally visible (and iirc thus not going to get inlined with many compiler options, due to symbol interposition dangers). But we could have a heap_compute_data_size_internal(bool flatten) that's called by heap_compute_data_size(). And something similar for heap_form_tuple(). But that's complicated, so I'd just go with the iteration in a heap_form_tuple() wrapper for now. > > > I'm open to being convinced that we don't need to do either of these > > > things, and that the cost of iterating over all varlenas in the tuple > > > is not so bad as to preclude doing things as you have them here. But, > > > I'm afraid it's going to be too expensive. > > > > I mean, I would just define several of those places away by not caring > > about tuples in a different compressino formation ending up in a > > table... > > That behavior feels unacceptable to me from a user expectations point > of view. I think there's an argument that if I update a tuple that > contains a compressed datum, and I don't update that particular > column, I think it would be OK to not recompress the column. But, if I > insert data into a table, I as a user would expect that the > compression settings for that column are going to be respected. IDK. The user might also expect that INSERT .. SELECT is fast, instead of doing expensive decompression + compression (with pglz the former can be really slow). I think there's a good argument for having an explicit "recompress" operation, but I'm not convincd that doing things implicitly is good, especially if it causes complications in quite a few places. Greetings, Andres Freund