Hello PG Hackers, Want to submit a patch that implements zstd compression for TOAST data using a 20-byte TOAST pointer format, directly addressing the concerns raised in prior discussions [1 <https://www.postgresql.org/message-id/flat/CAFAfj_F4qeRCNCYPk1vgH42fDZpjQWKO%2Bufq3FyoVyUa5AviFA%40mail.gmail.com#e41c78674adfa4d16b2fa82e59faf9aa> ][2 <https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xvbg7s4vr...@mail.gmail.com> ][3 <https://www.postgresql.org/message-id/flat/[email protected]>].
A bit of a background in the 2022 thread [3 <https://www.postgresql.org/message-id/flat/[email protected]>], Robert Haas suggested: "we had better reserve the fourth bit pattern for something extensible e.g. another byte or several to specify the actual method" i.e. something like: 00 = PGLZ 01 = LZ4 10 = reserved for future emergencies 11 = extended header with additional type byte Michael also asked whether we should have "something a bit more extensible for the design of an extensible varlena header." This patch implements that idea. The format: struct varatt_external_extended { int32 va_rawsize; /* same as legacy */ uint32 va_extinfo; /* cmid=3 signals extended format */ uint8 va_flags; /* feature flags */ uint8 va_data[3]; /* va_data[0] = compression method */ Oid va_valueid; /* same as legacy */ Oid va_toastrelid; /* same as legacy */ }; *A few notes:* - Zstd only applies to external TOAST, not inline compression. The 2-bit limit in va_tcinfo stays as-is for inline data, where pglz/lz4 work fine anyway. Zstd's wins show up on larger values. - A GUC use_extended_toast_header controls whether pglz/lz4 also use the 20-byte format (defaults to off for compatibility, can enable it if you want consistency). - Legacy 16-byte pointers continue to work - we check the vartag to determine which format to read. The 4 extra bytes per pointer is negligible for typical TOAST data sizes, and it gives us room to grow. Regards, Dharin
zstd-toast-compression-external.patch
Description: Binary data
