On Wed, May 7, 2025 at 5:38 PM Michael Paquier <mich...@paquier.xyz> wrote: > Yes, I was wondering if this is not the most natural approach in terms > of structure once if we plug an extra byte into the varlena header if > all the bits of va_extinfo for the compression information are used. > Having all the bits may not mean that this necessarily means that the > information would be cmp_data all the time, just that this a natural > option when plugging in a new compression method in the new byte > available. >
Thanks for reviewing and providing feedback on the patch. Regarding questions about varatt_external—specifically, storing compression methods in one byte for extended compression methods for external ondisk datum here’s the proposal for varatt_external. We check for compression methods for external ondisk datum in 3 trivial places in core, my previous proposal just mark 0b11 in the top bits of va_extinfo and fetch externally stored chunks and form varattrib_4b to find the compression method id for extended compression methods. However, I understand why embedding the method byte directly is clearer. ``` typedef struct varatt_external { int32 va_rawsize; /* Original data size (includes header) */ uint32 va_extinfo; /* External size (without header) and * compression method */ Oid va_valueid; /* Unique ID within TOAST table */ Oid va_toastrelid; /* OID of TOAST table containing it */ /* -------- optional trailer -------- */ union { struct /* compression-method trailer */ { uint8 va_ecinfo; /* Extended-compression-method info */ } cmp; } extended; /* “extended” = optional byte */ } varatt_external; ``` I'm proposing not to store algorithm metadata exclusively at varatt_external level because storing metadata within varatt_external is not always appropriate because in scenarios where datum initially qualifies for out-of-line storage but becomes sufficiently small in size after compression—specifically under the 2KB threshold(extended storage type)—it no longer meets the criteria for external storage. Consequently, it cannot utilize a TOAST pointer and must instead be stored in-line. Given this behavior, it is more robust to store metadata at the varattrib_4b level. This ensures that metadata remains accessible regardless of whether the datum ends up stored in-line or externally. Moreover, during detoasting it first fetches the external data, reconstructs it into varattrib_4b, then decompresses—so keeping metadata in varattrib_4b matches that flow. This is the layout for extra 1 byte in both varatt_external and varattrib_4b. ``` bit 7 6 5 4 3 2 1 0 +---+---+---+---+---+---+---+---+ | cmid − 2 | F| +---+---+---+---+---+---+---+---+ • Bits 7–1 (cmid − 2) – 7-bit field holding compression IDs: raw ∈ [0…127] ⇒ cmid = raw + 2 ([2…129]) • Bit 0 (F) – flag indicating whether the algorithm expects metadata ``` Introduced metadata flag in the 1-byte layout, To prevent zstd from exposing dict or nodict types for ToastCompressionId. This metadata flag indicates whether the algorithm expects any metadata or not. For the ZSTD scenario, if the flag is set, it expects a dictid; otherwise, no dictid is present. ``` typedef enum ToastCompressionId { TOAST_PGLZ_COMPRESSION_ID = 0, TOAST_LZ4_COMPRESSION_ID = 1, TOAST_ZSTD_COMPRESSION_ID = 2, TOAST_INVALID_COMPRESSION_ID = 3, } ToastCompressionId; // varattrib_4b remains unchanged from the previous proposal typedef union { struct /* Normal varlena (4-byte length) */ { uint32 va_header; char va_data[FLEXIBLE_ARRAY_MEMBER]; } va_4byte; struct /* Compressed in-line format */ { uint32 va_header; uint32 va_tcinfo; /* Original data size and method; see va_extinfo */ char va_data[FLEXIBLE_ARRAY_MEMBER]; } va_compressed; struct /* Extended compressed in-line format */ { uint32 va_header; uint32 va_tcinfo; /* Original data size and method; see va_extinfo */ uint8 va_ecinfo; char va_data[FLEXIBLE_ARRAY_MEMBER]; } va_compressed_ext; } varattrib_4b; ``` During compression, compression methods (zstd_compress_datum) will determine whether to use metadata(dictionary) or not based on CompressionInfo.meta. Per-column ZSTD compression levels: Since ZSTD supports compression levels (default = 3, up to ZSTD_maxCLevel()—currently 22—and negative “fast” levels), I’m proposing an option for users to choose their preferred level on a per-column basis via pg_attribute.attoptions. If unset, we’ll use ZSTD’s default: ``` typedef struct AttributeOpts { int32 vl_len_; /* varlena header (do not touch!) */ float8 n_distinct; float8 n_distinct_inherited; int zstd_level; /* user-specified ZSTD level */ } AttributeOpts; ALTER TABLE tblname ALTER COLUMN colname SET (zstd_level = 5); ``` Since PostgreSQL doesn’t currently expose LZ4 compression levels, I propose adding per-column ZSTD compression level settings so users can tune the speed/ratio trade-off. I’d like to hear thoughts on this approach. v24-0001-Design-to-extend-the-varattrib_4b-varatt_externa.patch - Design proposal for varattrib_4b & varatt_external v24-0002-zstd-nodict-compression.patch - ZSTD no dictionary implementation. -- Nikhil Veldanda
v24-0002-zstd-nodict-compression.patch
Description: Binary data
v24-0001-Design-to-extend-the-varattrib_4b-varatt_externa.patch
Description: Binary data