On Tue, Jul 15, 2025 at 10:37:02PM -0700, Nikhil Kumar Veldanda wrote:
> 0001 – pg_compression_available()
> pg_compression_available() in misc.c feels sensible;
Actually, I have taken a step back on this one and recalled that the
list of values available for an enum GUC are already available in
p
Hi Michael,
On Tue, Jul 15, 2025 at 9:44 PM Michael Paquier wrote:
>
> I have no idea yet about the fate of the other TOAST patches I have
> proposed for this commit fest, but let's move on with this part of the
> refactoring by splitting the TOAST regression tests for LZ4 and pglz,
> with the ne
On Wed, Jun 11, 2025 at 11:42:02AM +0900, Michael Paquier wrote:
> The split of the tests is not completely clean as presented in your
> patch, though. Your patch only does a copy-paste of the original
> file. Some of the basic tests of compression.sql check the
> interactions between the use of
On Thu, Jun 05, 2025 at 12:03:49AM -0700, Nikhil Kumar Veldanda wrote:
> Agreed. I introduced pg_compression_available(text) and refactored the
> SQL tests accordingly. I split out LZ4 tests into compression_lz4.sql
> and created compression_zstd.sql with the appropriate differences.
>
> v25-0001-
On Tue, May 27, 2025 at 02:59:17AM -0700, Nikhil Kumar Veldanda wrote:
> typedef struct varatt_external
> {
> int32 va_rawsize; /* Original data size (includes header) */
> uint32 va_extinfo; /* External size (without header) and
> * compression metho
On Wed, May 7, 2025 at 5:38 PM Michael Paquier wrote:
> Yes, I was wondering if this is not the most natural approach in terms
> of structure once if we plug an extra byte into the varlena header if
> all the bits of va_extinfo for the compression information are used.
> Having all the bits may no
On Wed, May 07, 2025 at 11:40:14AM +0300, Nikita Malakhov wrote:
> Michael, what do you think of this approach (extending varatt_external)
> vs extending varatt itself by new tag and structure?
I'm reserved on that. What I'm afraid here is more complications in
the backend code because we have qu
On Wed, May 07, 2025 at 04:39:17PM -0700, Nikhil Kumar Veldanda wrote:
> In patch v21, va_compressed.va_data points to varatt_cmp_extended, so
> adding it isn’t strictly necessary. If we do want to fold it into the
> varattrib_4b union, we could define it like this:
>
> ```
> typedef union
> {
>
Hi Michael, Thanks for the feedback.
On Wed, May 7, 2025 at 12:49 AM Michael Paquier wrote:
>
> I have been reading 0001 and I'm finding that the integration does not
> seem to fit much with the existing varatt_external, making the whole
> result slightly confusing. A simple thing: the last bit
Hi Robert,
On Mon, May 5, 2025 at 8:07 AM Robert Haas wrote:
> I don't understand why we need this. I don't see why we need any sort
> of generalized concept of metadata at all here. The zstd-dict
> compression method needs to store a four-byte OID, so let it do that.
> But we don't need to brand
Hi!
Michael, what do you think of this approach (extending varatt_external)
vs extending varatt itself by new tag and structure? The second approach
allows more flexibility, independence of existing structure without
modifying
varatt_4b and is extensible further. I mentioned it above (extending
th
On Sun, May 04, 2025 at 05:54:34AM -0700, Nikhil Kumar Veldanda wrote:
> 3. Resulting on-disk layouts for zstd
>
> ZSTD (nodict) — datum on‑disk layout
>
> +--+
> | va_header (uint32) |
> +--+
> | va_tcinfo (uint32)
On Sun, May 4, 2025 at 8:54 AM Nikhil Kumar Veldanda
wrote:
> I agree. Each compression algorithm can decide its own metadata size
> overhead. Callbacks can provide this information as well rather than
> storing in fixed length bytes(3 bytes). The revised patch introduces a
> "toast_cmpid_meta_siz
Hi Robert
> But I don't quite understand the point of this
> response: it seems like you're just restating what the design does
> without really justifying it. The question here isn't whether a 3-byte
> header can describe a length up to 16MB; I think we all know our
> powers of two well enough to
On Mon, Apr 28, 2025 at 5:32 PM Nikhil Kumar Veldanda
wrote:
> Thanks for raising that question. The idea behind including a 24-bit
> length field alongside the 1-byte algorithm ID is to ensure that each
> compressed datum self-describes its metadata size. This allows any
> compression algorithm t
Hi,
Nikhil, please consider existing discussions on using dictionaries
(mentioned above by Aleksander) and extending the TOAST pointer [1],
it seems you did not check them.
The same question Robert asked above - it's unclear why the header
wastes so much space. You mentioned metadata length - wha
Hi Robert,
Thanks for raising that question. The idea behind including a 24-bit
length field alongside the 1-byte algorithm ID is to ensure that each
compressed datum self-describes its metadata size. This allows any
compression algorithm to embed variable-length metadata (up to 16 MB)
without the
On Fri, Apr 25, 2025 at 11:15 AM Nikhil Kumar Veldanda
wrote:
> a. 24 bits for length → per-datum compression algorithm metadata is
> capped at 16 MB, which is far more than any realistic compression
> header.
> b. 8 bits for algorithm id → up to 256 algorithms.
> c. Zero-overhead when unused if a
Hi Michael,
Thanks for the suggestions. I agree that we should first solve the
“last–free-bit” problem in varattrib_4b compression bits before
layering on any features. Below is the approach I’ve prototyped to
keep the header compact yet fully extensible, followed by a sketch of
the plain-ZSTD(no
On Wed, Apr 23, 2025 at 11:59:26AM -0400, Robert Haas wrote:
> That's nice to know, but I think the key question is not so much what
> the feature costs when it is used but what it costs when it isn't
> used. If we implement a system where we don't let
> dictionary-compressed zstd datums leak out o
On Wed, Apr 23, 2025 at 11:59 AM Robert Haas wrote:
> heap_toast_insert_or_update care about HeapTupleHasExternal(), which
> seems like it might be a key point.
Care about HeapTupleHasVarWidth, rather.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Apr 21, 2025 at 8:52 PM Nikhil Kumar Veldanda
wrote:
> After reviewing the email thread you attached on previous response, I
> identified a natural choke point for both inserts and updates: the
> call to "heap_toast_insert_or_update" inside
> heap_prepare_insert/heap_update. In the current
Hi,
On 2025-04-18 12:22:18 -0400, Robert Haas wrote:
> On Tue, Apr 15, 2025 at 2:13 PM Nikhil Kumar Veldanda
> wrote:
> > Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ... SELECT
> > ...)
> >
> > As compressed datums can be copied to other unrelated tables via CTAS,
> > INSERT
Hi Michael,
Thanks for the feedback and the suggested patch sequence. I completely
agree—we must minimize storage overhead when dictionaries aren’t used,
while ensuring varattrib_4b remains extensible enough to handle future
compression metadata beyond dictionary ID (for other algorithms). I’ll
ex
Hi Robert,
Thank you for your feedback on the patch. You’re right that my
proposed design will introduce more dictionary dependencies as
dictionaries grow, I chose this path specifically to avoid changing
existing system behavior and prevent perf regressions in CTAS and
related commands.
After re
On Fri, Apr 18, 2025 at 12:22:18PM -0400, Robert Haas wrote:
> I think we could add plain-old zstd compression without really
> tackling this issue, but if we are going to add dictionaries then I
> think we might need to revisit the idea of preventing things from
> leaking out of tables. What I can
On Tue, Apr 15, 2025 at 2:13 PM Nikhil Kumar Veldanda
wrote:
> Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ... SELECT
> ...)
>
> As compressed datums can be copied to other unrelated tables via CTAS,
> INSERT INTO ... SELECT, or CREATE TABLE ... EXECUTE, I’ve introduced a
> m
On Fri, Mar 7, 2025 at 8:36 PM Nikhil Kumar Veldanda
wrote:
> struct/* Extended compression format */
> {
> uint32va_header;
> uint32va_tcinfo;
> uint32va_cmp_alg;
> uint32va_cmp_dictid;
> charva_data[FLEXIBLE_ARRAY_MEMBER];
>
Hi Nikhil,
Many thanks for working on this. I proposed a similar patch some time
ago [1] but the overall feedback was somewhat mixed so I choose to
focus on something else. Thanks for peeking this up.
> test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1);
> build_zstd_dict_for_attri
06.03.2025 19:29, Nikhil Kumar Veldanda пишет:
> Hi,
>
>> Overall idea is great.
>>
>> I just want to mention LZ4 also have API to use dictionary. Its dictionary
>> will be as simple as "virtually prepended" text (in contrast to complex
>> ZStd dictionary format).
>>
>> I mean, it would be great i
On Thu, 6 Mar 2025 at 08:43, Nikhil Kumar Veldanda
wrote:
>
> Hi all,
>
> The ZStandard compression algorithm [1][2], though not currently used for
> TOAST compression in PostgreSQL, offers significantly improved compression
> ratios compared to lz4/pglz in both dictionary-based and non-dictiona
Hi,
I reviewed the discussions, and while most agreements focused on
changes to the toast pointer, the design I propose requires no
modifications to it. I’ve carefully considered the design choices made
previously, and I recognize Zstd’s clear advantages in compression
efficiency and performance o
Hi Robert,
> I think that solving the problems around using a dictionary is going
> to be really hard. Can we see some evidence that the results will be
> worth it?
Compression dictionaries give a good compression ratio (~50%) and also
increase TPS a bit (5-10%) due to better buffer cache utiliza
Hi Nikhil,
> Thank you for highlighting the previous discussion—I reviewed [1]
> closely. While both methods involve dictionary-based compression, the
> approach I'm proposing differs significantly.
>
> The previous method explicitly extracted string values from JSONB and
> assigned unique OIDs to
Hi Tom,
On Thu, Mar 6, 2025 at 11:33 AM Tom Lane wrote:
>
> Robert Haas writes:
> > On Thu, Mar 6, 2025 at 12:43 AM Nikhil Kumar Veldanda
> > wrote:
> >> Notably, this is the first compression algorithm for Postgres that can
> >> make use of a dictionary to provide higher levels of compression
Hi
On Thu, Mar 6, 2025 at 5:35 AM Aleksander Alekseev
wrote:
>
> Hi Nikhil,
>
> Many thanks for working on this. I proposed a similar patch some time
> ago [1] but the overall feedback was somewhat mixed so I choose to
> focus on something else. Thanks for peeking this up.
>
> > test=# select bui
Hi Robert,
> I think that solving the problems around using a dictionary is going
> to be really hard. Can we see some evidence that the results will be
> worth it?
With the latest patch I've shared,
Using a Kaggle dataset of Nintendo-related tweets[1], we leveraged
PostgreSQL's acquire_sample_r
Hi,
> Overall idea is great.
>
> I just want to mention LZ4 also have API to use dictionary. Its dictionary
> will be as simple as "virtually prepended" text (in contrast to complex
> ZStd dictionary format).
>
> I mean, it would be great if "dictionary" will be common property for
> different alg
Robert Haas writes:
> On Thu, Mar 6, 2025 at 12:43 AM Nikhil Kumar Veldanda
> wrote:
>> Notably, this is the first compression algorithm for Postgres that can make
>> use of a dictionary to provide higher levels of compression, but
>> dictionaries have to be generated and maintained,
> I think
On Thu, Mar 6, 2025 at 12:43 AM Nikhil Kumar Veldanda
wrote:
> Notably, this is the first compression algorithm for Postgres that can make
> use of a dictionary to provide higher levels of compression, but dictionaries
> have to be generated and maintained,
I think that solving the problems aro
Hi Yura,
> So, to support "super-fast" mode you have to accept negative compression
> levels. I didn't check, probably you're already support them?
>
The key point I want to emphasize is that both zstd compression levels
and dictionary size should be configurable based on user preferences
at attr
06.03.2025 08:32, Nikhil Kumar Veldanda пишет:
> Hi all,
>
> The ZStandard compression algorithm [1][2], though not currently used for
> TOAST compression in PostgreSQL, offers significantly improved compression
> ratios compared to lz4/pglz in both dictionary-based and non-dictionary
> modes. Att
42 matches
Mail list logo