On Wed, May 07, 2025 at 11:40:14AM +0300, Nikita Malakhov wrote:
> Michael, what do you think of this approach (extending varatt_external)
> vs extending varatt itself by new tag and structure?

I'm reserved on that.  What I'm afraid here is more complications in
the backend code because we have quite a few places where we do
varatt lookups to decide what should be happen, like in PLs, so this 
brings complications of its own for something that could be isolated
behind a varattrib_4b, where detoasting is under control.  The patch
posted at [1] means that the custom area could be anything, how do you
make sure that the backend is able to understand what could be
anything?  I guess that this also depends on the pluggable toast part, 
of course, but I've not studied enough what's been proposed to have a
hard opinion.  If you have very specific pointers, please feel free.

> The second approach
> allows more flexibility, independence of existing structure without
> modifying
> varatt_4b and is extensible further. I mentioned it above (extending
> the TOAST pointer), and it could be implemented more easily and in a less
> confusing way.

If you mean [0], putting an "extended" flag into ToastCompressionId
which is something used now by the internals of TOAST for a
compression method, with ToastCompressionId being limited to have up
to 4 elements in its enum, does not feel right.  In concept, once
extended, this may point to something more than a compression method,
as there's also metadata around the compression method added.  At
least that's what I'm understanding as a possible scenario from all
the proposals in this area.  There's some overlap with
common/compression.h, for example, even if we are never going to care
about gzip in this case, just saying that this has been buzzing me in 
the core code for some time.

One first thing I'd try to do here is to untangle this situation, by
allowing ToastCompressionId to have more extensibility so as we could
use it to track more compression methods, or just perhaps remove it
entirely in a smart way by keeping the information related to the
extra byte and the two bits of va_tcinfo for the compression method
isolated in varatt.h, shaping the code so as adding more compression
methods in the extra byte put after va_tcinfo would be easier once the
surroundings of varattrib_4b are extended.  Without an agreement about
how to use the last bit we have, there's perhaps little point in
aiming for any of that now.

FWIW, extending the area around varattrib_4b feels a natural thing to
do here, and it does not have to overlap with the possibilities around
the varatts.

> I'm +1 on storing dictionary somewhere around actual data (not necessary
> in the data storage area itself) but strongly against new catalog table
> with dictionaries - it involves a lot of side effects, including locks
> while working
> with this table resulting in performance degradation, and so on.

Just wondering.  Have you looked at the potential overhead of doing
computation and decomputation of a dictionnary?  zstd mentions in its
docs that these can easily cause a lot of overhead, hence handling
this stuff without some kind of caching is going to be costly if
performing a lot of chunk decompressions.  It's something that could
be decided later on, of course.  If this area of the code is made
pluggable, then it's up to an extension to just do it.

[0]: 
https://www.postgresql.org/message-id/flat/CAN-LCVMq2X%3Dfhx7KLxfeDyb3P%2BBXuCkHC0g%3D9GF%2BJD4izfVa0Q%40mail.gmail.com
[1]: 
https://www.postgresql.org/message-id/CAN-LCVNxbnpHh4PVUUc9g6dPibE8wZALiLtxcs3TjfivxDkCkA%40mail.gmail.com
--
Michael

Attachment: signature.asc
Description: PGP signature

Reply via email to