Hi Nikita, > this part of the PostgreSQL screams to be revised and improved
I completely agree. The problem with TOAST pointers is that they are not extendable at the moment which prevents adding new compression algorithms (e.g. ZSTD), new features like compression dictionaries [1], etc. I suggest we add extensibility in order to solve this problem for the foreseeable future for everyone. > where Custom TOAST Pointer is distinguished from Regular one by va_flag field > which is a part of varlena header I don't think that varlena header is the best place to distinguish a classical TOAST pointer from an extended one. On top of that I don't see any free bits that would allow adding such a flag to the on-disk varlena representation [2]. The current on-disk TOAST pointer representation is following: ``` typedef struct varatt_external { int32 va_rawsize; /* Original data size (includes header) */ uint32 va_extinfo; /* External saved size (without header) and * compression method */ Oid va_valueid; /* Unique ID of value within TOAST table */ Oid va_toastrelid; /* RelID of TOAST table containing it */ } varatt_external; ``` Note that currently only 2 compression methods are supported: ``` typedef enum ToastCompressionId { TOAST_PGLZ_COMPRESSION_ID = 0, TOAST_LZ4_COMPRESSION_ID = 1, TOAST_INVALID_COMPRESSION_ID = 2 } ToastCompressionId; ``` I suggest adding a new flag that will mark an extended TOAST format: ``` typedef enum ToastCompressionId { TOAST_PGLZ_COMPRESSION_ID = 0, TOAST_LZ4_COMPRESSION_ID = 1, TOAST_RESERVED_COMPRESSION_ID = 2, TOAST_HAS_EXTENDED_FORMAT = 3, } ToastCompressionId; ``` For an extended format we add a varint (utf8-like) bitmask right after varatt_external that marks the features supported in this particular instance of the pointer. The rest of the data is interpreted depending on the bits set. This will allow us to extend the pointers indefinitely. Note that the proposed approach doesn't require running any migrations. Note also that I described only the on-disk representation. We can tweak the in-memory representation as we want without affecting the end user. Thoughts? [1]: https://commitfest.postgresql.org/43/3626/ [2]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/postgres.h;h=0446daa0e61722067bb75aa693a92b38736e12df;hb=164d174bbf9a3aba719c845497863cd3c49a3ad0#l178 -- Best regards, Aleksander Alekseev