Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-07-16 Thread Michael Paquier
On Tue, Jul 15, 2025 at 10:37:02PM -0700, Nikhil Kumar Veldanda wrote: > 0001 – pg_compression_available() > pg_compression_available() in misc.c feels sensible; Actually, I have taken a step back on this one and recalled that the list of values available for an enum GUC are already available in p

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-07-15 Thread Nikhil Kumar Veldanda
Hi Michael, On Tue, Jul 15, 2025 at 9:44 PM Michael Paquier wrote: > > I have no idea yet about the fate of the other TOAST patches I have > proposed for this commit fest, but let's move on with this part of the > refactoring by splitting the TOAST regression tests for LZ4 and pglz, > with the ne

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-07-15 Thread Michael Paquier
On Wed, Jun 11, 2025 at 11:42:02AM +0900, Michael Paquier wrote: > The split of the tests is not completely clean as presented in your > patch, though. Your patch only does a copy-paste of the original > file. Some of the basic tests of compression.sql check the > interactions between the use of

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-06-10 Thread Michael Paquier
On Thu, Jun 05, 2025 at 12:03:49AM -0700, Nikhil Kumar Veldanda wrote: > Agreed. I introduced pg_compression_available(text) and refactored the > SQL tests accordingly. I split out LZ4 tests into compression_lz4.sql > and created compression_zstd.sql with the appropriate differences. > > v25-0001-

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-05-30 Thread Michael Paquier
On Tue, May 27, 2025 at 02:59:17AM -0700, Nikhil Kumar Veldanda wrote: > typedef struct varatt_external > { > int32 va_rawsize; /* Original data size (includes header) */ > uint32 va_extinfo; /* External size (without header) and > * compression metho

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-05-27 Thread Nikhil Kumar Veldanda
On Wed, May 7, 2025 at 5:38 PM Michael Paquier wrote: > Yes, I was wondering if this is not the most natural approach in terms > of structure once if we plug an extra byte into the varlena header if > all the bits of va_extinfo for the compression information are used. > Having all the bits may no

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-05-07 Thread Michael Paquier
On Wed, May 07, 2025 at 11:40:14AM +0300, Nikita Malakhov wrote: > Michael, what do you think of this approach (extending varatt_external) > vs extending varatt itself by new tag and structure? I'm reserved on that. What I'm afraid here is more complications in the backend code because we have qu

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-05-07 Thread Michael Paquier
On Wed, May 07, 2025 at 04:39:17PM -0700, Nikhil Kumar Veldanda wrote: > In patch v21, va_compressed.va_data points to varatt_cmp_extended, so > adding it isn’t strictly necessary. If we do want to fold it into the > varattrib_4b union, we could define it like this: > > ``` > typedef union > { >

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-05-07 Thread Nikhil Kumar Veldanda
Hi Michael, Thanks for the feedback. On Wed, May 7, 2025 at 12:49 AM Michael Paquier wrote: > > I have been reading 0001 and I'm finding that the integration does not > seem to fit much with the existing varatt_external, making the whole > result slightly confusing. A simple thing: the last bit

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-05-07 Thread Nikhil Kumar Veldanda
Hi Robert, On Mon, May 5, 2025 at 8:07 AM Robert Haas wrote: > I don't understand why we need this. I don't see why we need any sort > of generalized concept of metadata at all here. The zstd-dict > compression method needs to store a four-byte OID, so let it do that. > But we don't need to brand

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-05-07 Thread Nikita Malakhov
Hi! Michael, what do you think of this approach (extending varatt_external) vs extending varatt itself by new tag and structure? The second approach allows more flexibility, independence of existing structure without modifying varatt_4b and is extensible further. I mentioned it above (extending th

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-05-07 Thread Michael Paquier
On Sun, May 04, 2025 at 05:54:34AM -0700, Nikhil Kumar Veldanda wrote: > 3. Resulting on-disk layouts for zstd > > ZSTD (nodict) — datum on‑disk layout > > +--+ > | va_header (uint32) | > +--+ > | va_tcinfo (uint32)

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-05-05 Thread Robert Haas
On Sun, May 4, 2025 at 8:54 AM Nikhil Kumar Veldanda wrote: > I agree. Each compression algorithm can decide its own metadata size > overhead. Callbacks can provide this information as well rather than > storing in fixed length bytes(3 bytes). The revised patch introduces a > "toast_cmpid_meta_siz

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-05-04 Thread Nikhil Kumar Veldanda
Hi Robert > But I don't quite understand the point of this > response: it seems like you're just restating what the design does > without really justifying it. The question here isn't whether a 3-byte > header can describe a length up to 16MB; I think we all know our > powers of two well enough to

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-04-29 Thread Robert Haas
On Mon, Apr 28, 2025 at 5:32 PM Nikhil Kumar Veldanda wrote: > Thanks for raising that question. The idea behind including a 24-bit > length field alongside the 1-byte algorithm ID is to ensure that each > compressed datum self-describes its metadata size. This allows any > compression algorithm t

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-04-29 Thread Nikita Malakhov
Hi, Nikhil, please consider existing discussions on using dictionaries (mentioned above by Aleksander) and extending the TOAST pointer [1], it seems you did not check them. The same question Robert asked above - it's unclear why the header wastes so much space. You mentioned metadata length - wha

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-04-28 Thread Nikhil Kumar Veldanda
Hi Robert, Thanks for raising that question. The idea behind including a 24-bit length field alongside the 1-byte algorithm ID is to ensure that each compressed datum self-describes its metadata size. This allows any compression algorithm to embed variable-length metadata (up to 16 MB) without the

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-04-28 Thread Robert Haas
On Fri, Apr 25, 2025 at 11:15 AM Nikhil Kumar Veldanda wrote: > a. 24 bits for length → per-datum compression algorithm metadata is > capped at 16 MB, which is far more than any realistic compression > header. > b. 8 bits for algorithm id → up to 256 algorithms. > c. Zero-overhead when unused if a

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-04-25 Thread Nikhil Kumar Veldanda
Hi Michael, Thanks for the suggestions. I agree that we should first solve the “last–free-bit” problem in varattrib_4b compression bits before layering on any features. Below is the approach I’ve prototyped to keep the header compact yet fully extensible, followed by a sketch of the plain-ZSTD(no

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-04-24 Thread Michael Paquier
On Wed, Apr 23, 2025 at 11:59:26AM -0400, Robert Haas wrote: > That's nice to know, but I think the key question is not so much what > the feature costs when it is used but what it costs when it isn't > used. If we implement a system where we don't let > dictionary-compressed zstd datums leak out o

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-04-23 Thread Robert Haas
On Wed, Apr 23, 2025 at 11:59 AM Robert Haas wrote: > heap_toast_insert_or_update care about HeapTupleHasExternal(), which > seems like it might be a key point. Care about HeapTupleHasVarWidth, rather. -- Robert Haas EDB: http://www.enterprisedb.com

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-04-23 Thread Robert Haas
On Mon, Apr 21, 2025 at 8:52 PM Nikhil Kumar Veldanda wrote: > After reviewing the email thread you attached on previous response, I > identified a natural choke point for both inserts and updates: the > call to "heap_toast_insert_or_update" inside > heap_prepare_insert/heap_update. In the current

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-04-22 Thread Andres Freund
Hi, On 2025-04-18 12:22:18 -0400, Robert Haas wrote: > On Tue, Apr 15, 2025 at 2:13 PM Nikhil Kumar Veldanda > wrote: > > Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ... SELECT > > ...) > > > > As compressed datums can be copied to other unrelated tables via CTAS, > > INSERT

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-04-21 Thread Nikhil Kumar Veldanda
Hi Michael, Thanks for the feedback and the suggested patch sequence. I completely agree—we must minimize storage overhead when dictionaries aren’t used, while ensuring varattrib_4b remains extensible enough to handle future compression metadata beyond dictionary ID (for other algorithms). I’ll ex

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-04-21 Thread Nikhil Kumar Veldanda
Hi Robert, Thank you for your feedback on the patch. You’re right that my proposed design will introduce more dictionary dependencies as dictionaries grow, I chose this path specifically to avoid changing existing system behavior and prevent perf regressions in CTAS and related commands. After re

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-04-21 Thread Michael Paquier
On Fri, Apr 18, 2025 at 12:22:18PM -0400, Robert Haas wrote: > I think we could add plain-old zstd compression without really > tackling this issue, but if we are going to add dictionaries then I > think we might need to revisit the idea of preventing things from > leaking out of tables. What I can

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-04-18 Thread Robert Haas
On Tue, Apr 15, 2025 at 2:13 PM Nikhil Kumar Veldanda wrote: > Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ... SELECT > ...) > > As compressed datums can be copied to other unrelated tables via CTAS, > INSERT INTO ... SELECT, or CREATE TABLE ... EXECUTE, I’ve introduced a > m

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-03-17 Thread Robert Haas
On Fri, Mar 7, 2025 at 8:36 PM Nikhil Kumar Veldanda wrote: > struct/* Extended compression format */ > { > uint32va_header; > uint32va_tcinfo; > uint32va_cmp_alg; > uint32va_cmp_dictid; > charva_data[FLEXIBLE_ARRAY_MEMBER]; >

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-03-15 Thread Aleksander Alekseev
Hi Nikhil, Many thanks for working on this. I proposed a similar patch some time ago [1] but the overall feedback was somewhat mixed so I choose to focus on something else. Thanks for peeking this up. > test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1); > build_zstd_dict_for_attri

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-03-15 Thread Yura Sokolov
06.03.2025 19:29, Nikhil Kumar Veldanda пишет: > Hi, > >> Overall idea is great. >> >> I just want to mention LZ4 also have API to use dictionary. Its dictionary >> will be as simple as "virtually prepended" text (in contrast to complex >> ZStd dictionary format). >> >> I mean, it would be great i

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-03-08 Thread Kirill Reshke
On Thu, 6 Mar 2025 at 08:43, Nikhil Kumar Veldanda wrote: > > Hi all, > > The ZStandard compression algorithm [1][2], though not currently used for > TOAST compression in PostgreSQL, offers significantly improved compression > ratios compared to lz4/pglz in both dictionary-based and non-dictiona

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-03-07 Thread Nikhil Kumar Veldanda
Hi, I reviewed the discussions, and while most agreements focused on changes to the toast pointer, the design I propose requires no modifications to it. I’ve carefully considered the design choices made previously, and I recognize Zstd’s clear advantages in compression efficiency and performance o

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-03-07 Thread Aleksander Alekseev
Hi Robert, > I think that solving the problems around using a dictionary is going > to be really hard. Can we see some evidence that the results will be > worth it? Compression dictionaries give a good compression ratio (~50%) and also increase TPS a bit (5-10%) due to better buffer cache utiliza

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-03-07 Thread Aleksander Alekseev
Hi Nikhil, > Thank you for highlighting the previous discussion—I reviewed [1] > closely. While both methods involve dictionary-based compression, the > approach I'm proposing differs significantly. > > The previous method explicitly extracted string values from JSONB and > assigned unique OIDs to

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-03-06 Thread Nikhil Kumar Veldanda
Hi Tom, On Thu, Mar 6, 2025 at 11:33 AM Tom Lane wrote: > > Robert Haas writes: > > On Thu, Mar 6, 2025 at 12:43 AM Nikhil Kumar Veldanda > > wrote: > >> Notably, this is the first compression algorithm for Postgres that can > >> make use of a dictionary to provide higher levels of compression

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-03-06 Thread Nikhil Kumar Veldanda
Hi On Thu, Mar 6, 2025 at 5:35 AM Aleksander Alekseev wrote: > > Hi Nikhil, > > Many thanks for working on this. I proposed a similar patch some time > ago [1] but the overall feedback was somewhat mixed so I choose to > focus on something else. Thanks for peeking this up. > > > test=# select bui

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-03-06 Thread Nikhil Kumar Veldanda
Hi Robert, > I think that solving the problems around using a dictionary is going > to be really hard. Can we see some evidence that the results will be > worth it? With the latest patch I've shared, Using a Kaggle dataset of Nintendo-related tweets[1], we leveraged PostgreSQL's acquire_sample_r

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-03-06 Thread Nikhil Kumar Veldanda
Hi, > Overall idea is great. > > I just want to mention LZ4 also have API to use dictionary. Its dictionary > will be as simple as "virtually prepended" text (in contrast to complex > ZStd dictionary format). > > I mean, it would be great if "dictionary" will be common property for > different alg

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-03-06 Thread Tom Lane
Robert Haas writes: > On Thu, Mar 6, 2025 at 12:43 AM Nikhil Kumar Veldanda > wrote: >> Notably, this is the first compression algorithm for Postgres that can make >> use of a dictionary to provide higher levels of compression, but >> dictionaries have to be generated and maintained, > I think

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-03-06 Thread Robert Haas
On Thu, Mar 6, 2025 at 12:43 AM Nikhil Kumar Veldanda wrote: > Notably, this is the first compression algorithm for Postgres that can make > use of a dictionary to provide higher levels of compression, but dictionaries > have to be generated and maintained, I think that solving the problems aro

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-03-06 Thread Nikhil Kumar Veldanda
Hi Yura, > So, to support "super-fast" mode you have to accept negative compression > levels. I didn't check, probably you're already support them? > The key point I want to emphasize is that both zstd compression levels and dictionary size should be configurable based on user preferences at attr

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-03-06 Thread Yura Sokolov
06.03.2025 08:32, Nikhil Kumar Veldanda пишет: > Hi all, > > The ZStandard compression algorithm [1][2], though not currently used for > TOAST compression in PostgreSQL, offers significantly improved compression > ratios compared to lz4/pglz in both dictionary-based and non-dictionary > modes. Att