On Thu, Dec 18, 2025 at 10:44:22PM +0100, Dharin Shah wrote: > I want to make sure I understand your main point: you're OK with a new > `vartag_external`, but prefer we avoid increasing the heap TOAST pointer > from 16 -> 20 bytes since every zstd-toasted value would pay +4 bytes in > the main heap tuple.
That would be my choice, yes. Not sure about the opinion of others on this matter. > I also realize the "compatibility" of the extended header doesn't buy us > much — we'll need to support the existing 16-byte varatt_external forever > for backward compatibility. Adding a 20-byte structure just means two > formats to maintain indefinitely. Yes. Patches have to maintain on-disk compatibility. > A couple clarifying questions if we go with new vartag (e.g., > `VARTAG_ONDISK_ZSTD`), same 16-byte `varatt_external` payload, vartag as > discriminator > 1. How should we handle future methods beyond zstd? One tag per method, or > store a method id elsewhere (e.g., in TOAST chunk header)? My suspicion would be that we could either use a new set of vartags in the future for each compression method. When it comes to zstd there is something that comes in play: we could set some bits related to dictionnaries at tuple level. Not sure if this is the best design or if using an attribute-level option is more adapted (for example a JSONB blob could be applied as an attribute with common keys in a dictionnary saving a lot of on-disk space even before compression), but keeping some bits free in the 16-byte header leaves this option open with a new vartag_external. Saying that, zstd is good enough that I strongly suspect that we would not regret it for quite a few years. One issue that has pushed towards the addition of lz4 as an option for toast compression is that pglz was worse in terms of CPU cost. zlib is also more expensive than lz4 or zstd, especially at very high compression level for usually little compression gains. > 2. And re: "as long as the TOAST value is 32 bits" — are you referring to > the 30-bit extsize field in va_extinfo (i.e., avoid stealing bits from > extsize for method encoding)? I mean extending the TOAST value to 8 bytes, as per the following issues: https://www.postgresql.org/message-id/764273.1669674269%40sss.pgh.pa.us https://commitfest.postgresql.org/patch/5830/ > *Key findings (i guess well known at this point):* > - ZSTD excels for repetitive/pattern-heavy data (6.7x better than PGLZ) > - For low-redundancy data (MD5 hashes), ZSTD still achieves ~2x better > - The T4 result showing zstd as "worse" is not about compression quality - > it's about missing inline storage support. ZSTD actually compresses better, > but pays unnecessary TOAST overhead. > > I'll share the detailed benchmark script with the next patch revision. But > also a potential path forward could be that we could just fully replace > pglz (can bring it up later in different thread) I don't think that we will ever be able to remove pglz. It would be nice, as final result of course, but I also expect that not being able to decompress pglz data is going to lead to a lot of user pain. That would be also very expensive to check at upgrade for large instances. > *On Testing and Patch Structure* > Agreed on both points: > - I'll use `compression_zstd.sql` following the `compression_lz4.sql` > pattern (removing the test_toast_ext module) Okay. > - I'll split the GUC refactoring into a separate preparatory patch This refactoring, if done nicely, is worth an independent piece. It's something that I have actually done for the sake of the other thread, though the result was not really much liked by others. Perhaps I'm just lacking imagination with this abstraction, and I'd surely welcome different ideas. -- Michael
signature.asc
Description: PGP signature
