Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-03-07 Thread Nikhil Kumar Veldanda
Hi,

I reviewed the discussions, and while most agreements focused on
changes to the toast pointer, the design I propose requires no
modifications to it. I’ve carefully considered the design choices made
previously, and I recognize Zstd’s clear advantages in compression
efficiency and performance over algorithms like PGLZ and LZ4, we can
integrate it without altering the existing toast pointer
(varatt_external) structure.

By simply using the top two bits of the va_extinfo field (setting them
to '11') in `varatt_external`, we can signal an alternative
compression algorithm, clearly distinguishing new methods from legacy
ones. The specific algorithm used would then be recorded in the
va_cmp_alg field.

This approach addresses the issues raised in the summarized thread[1]
and to leverage dictionaries for the data that can stay in-line. While
my initial patch includes modifications to toast_pointer due to a
single dependency on (pg_column_compression), those changes aren’t
strictly necessary; resolving that dependency separately would make
the overall design even less intrusive.

Here’s an illustrative structure:
```
typedef union
{
struct/* Normal varlena (4-byte length) */
{
uint32va_header;
charva_data[FLEXIBLE_ARRAY_MEMBER];
}va_4byte;
struct/* Current Compressed format */
{
uint32va_header;
uint32va_tcinfo;/* Original size and compression method */
charva_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
}va_compressed;
struct/* Extended compression format */
{
uint32va_header;
uint32va_tcinfo;
uint32va_cmp_alg;
uint32va_cmp_dictid;
charva_data[FLEXIBLE_ARRAY_MEMBER];
}va_compressed_ext;
} varattrib_4b;

typedef struct varatt_external
{
int32 va_rawsize; /* Original data size (includes header) */
uint32 va_extinfo; /* External saved size (without header) and
* compression method */ `11` indicates new compression methods.
Oid va_valueid; /* Unique ID of value within TOAST table */
Oid va_toastrelid; /* RelID of TOAST table containing it */
} varatt_external;
```

Decompression flow remains straightforward: once a datum is identified
as external, we detoast it, then we identify the compression algorithm
using `
TOAST_COMPRESS_METHOD` macro which refers to a varattrib_4b structure
not a toast pointer. We retrieve the compression algorithm from either
va_tcinfo or va_cmp_alg based on adjusted macros, and decompress
accordingly.

In summary, integrating Zstandard into the TOAST framework in this
minimally invasive way should yield substantial benefits.

[1] 
https://www.postgresql.org/message-id/CAJ7c6TPSN06C%2B5cYSkyLkQbwN1C%2BpUNGmx%2BVoGCA-SPLCszC8w%40mail.gmail.com

Best regards,
Nikhil Veldanda

On Fri, Mar 7, 2025 at 3:42 AM Aleksander Alekseev
 wrote:
>
> Hi Nikhil,
>
> > Thank you for highlighting the previous discussion—I reviewed [1]
> > closely. While both methods involve dictionary-based compression, the
> > approach I'm proposing differs significantly.
> >
> > The previous method explicitly extracted string values from JSONB and
> > assigned unique OIDs to each entry, resulting in distinct dictionary
> > entries for every unique value. In contrast, this approach directly
> > leverages Zstandard's dictionary training API. We provide raw data
> > samples to Zstd, which generates a dictionary of a specified size.
> > This dictionary is then stored in a catalog table and used to compress
> > subsequent inserts for the specific attribute it was trained on.
> >
> > [...]
>
> You didn't read closely enough I'm afraid. As Tom pointed out, the
> title of the thread is misleading. On top of that there are several
> separate threads. I did my best to cross-reference them, but
> apparently didn't do good enough.
>
> Initially I proposed to add ZSON extension [1][2] to the PostgreSQL
> core. However the idea evolved into TOAST improvements that don't
> require a user to use special types. You may also find interesting the
> related "Pluggable TOASTer" discussion [3]. The idea there was rather
> different but the discussion about extending TOAST pointers so that in
> the future we can use something else than ZSTD is relevant.
>
> You will find the recent summary of the reached agreements somewhere
> around this message [4], take a look at the thread a bit above and
> below it.
>
> I believe this effort is important. You can't, however, simply discard
> everything that was discussed in this area for the past several years.
> If you want to succeed of course. No one will look at your patch if it
> doesn't account for all the previous discussions. I'm sorry, I know
> it's disappointing. This being said you should have done better
> research before submitting the code. You could just ask if anyone was
> working on something like this before and save a lot of time.
>
> Personally I would suggest starting with one little step toward
>

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-03-06 Thread Nikhil Kumar Veldanda
Hi,

> Overall idea is great.
>
> I just want to mention LZ4 also have API to use dictionary. Its dictionary
> will be as simple as "virtually prepended" text (in contrast to complex
> ZStd dictionary format).
>
> I mean, it would be great if "dictionary" will be common property for
> different algorithms.
>
> On the other hand, zstd have "super fast" mode which is actually a bit
> faster than LZ4 and compresses a bit better. So may be support for
> different algos is not essential. (But then we need a way to change
> compression level to that "super fast" mode.)
>

zstd compression level and zstd dictionary size is configurable at
attribute level using ALTER TABLE. Default zstd level is 3 and dict
size is 4KB. For super fast mode level can be set to 1.

```
test=# alter table zstd alter column doc set compression zstd;
ALTER TABLE
test=# alter table zstd alter column doc set(zstd_cmp_level = 1);
ALTER TABLE
test=# select * from pg_attribute where attrelid = 'zstd'::regclass
and attname = 'doc';
 attrelid | attname | atttypid | attlen | attnum | atttypmod |
attndims | attbyval | attalign | attstorage | attcompre
ssion | attnotnull | atthasdef | atthasmissing | attidentity |
attgenerated | attisdropped | attislocal | attinhcount
| attcollation | attstattarget | attacl |attoptions
| attfdwoptions | attmissingval
--+-+--+++---+--+--+--++--
--++---+---+-+--+--++-
+--+---++--+---+---
16389 | doc | 3802 | -1 |  1 |-1 |
0 | f| i| x  | z
  | f  | f | f | |
 | f| t  |   0
|0 |   ||
{zstd_dictid=1,zstd_cmp_level=1} |   |
(1 row)
```




Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-03-06 Thread Nikhil Kumar Veldanda
Hi Robert,

> I think that solving the problems around using a dictionary is going
> to be really hard. Can we see some evidence that the results will be
> worth it?

With the latest patch I've shared,

Using a Kaggle dataset of Nintendo-related tweets[1], we leveraged
PostgreSQL's acquire_sample_rows function to quickly gather just 1,000
sample rows for a specific attribute out of 104695 rows. These raw
samples were passed into Zstd's sampling buffer, generating a custom
dictionary. This dictionary was then directly used to compress the
documents, resulting in 62% of space savings after compressed:

```
test=# \dt+
 List of tables
 Schema |  Name  | Type  |  Owner   | Persistence | Access
method |  Size  | Description
++---+--+-+---++-
 public | lz4| table | nikhilkv | permanent   | heap
   | 297 MB |
 public | pglz   | table | nikhilkv | permanent   | heap
   | 259 MB |
 public | zstd_with_dict | table | nikhilkv | permanent   | heap
   | 114 MB |
 public | zstd_wo_dict   | table | nikhilkv | permanent   | heap
   | 210 MB |
(4 rows)
```

We've observed similarly strong results on other datasets as well with
using dictionaries.

[1] https://www.kaggle.com/code/dcalambas/nintendo-tweets-analysis/data

---
Nikhil Veldanda




Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-03-06 Thread Nikhil Kumar Veldanda
Hi

On Thu, Mar 6, 2025 at 5:35 AM Aleksander Alekseev
 wrote:
>
> Hi Nikhil,
>
> Many thanks for working on this. I proposed a similar patch some time
> ago [1] but the overall feedback was somewhat mixed so I choose to
> focus on something else. Thanks for peeking this up.
>
> > test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1);
> > build_zstd_dict_for_attribute
> > ---
> > t
> > (1 row)
>
> Did you have a chance to familiarize yourself with the corresponding
> discussion [1] and probably the previous threads? Particularly it was
> pointed out that dictionaries should be built automatically during
> VACUUM. We also discussed a special syntax for the feature, besides
> other things.
>
> [1]: 
> https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com

Restricting dictionary generation to the vacuum process is not ideal
because it limits user control and flexibility. Compression efficiency
is highly dependent on data distribution, which can change
dynamically. By allowing users to generate dictionaries on demand via
an API, they can optimize compression when they detect inefficiencies
rather than waiting for a vacuum process, which may not align with
their needs.

Additionally, since all dictionaries are stored in the catalog table
anyway, users can generate and manage them independently without
interfering with the system’s automatic maintenance tasks. This
approach ensures better adaptability to real-world scenarios where
compression performance needs to be monitored and adjusted in real
time.

---
Nikhil Veldanda




Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-03-06 Thread Nikhil Kumar Veldanda
Hi Tom,

On Thu, Mar 6, 2025 at 11:33 AM Tom Lane  wrote:
>
> Robert Haas  writes:
> > On Thu, Mar 6, 2025 at 12:43 AM Nikhil Kumar Veldanda
> >  wrote:
> >> Notably, this is the first compression algorithm for Postgres that can 
> >> make use of a dictionary to provide higher levels of compression, but 
> >> dictionaries have to be generated and maintained,
>
> > I think that solving the problems around using a dictionary is going
> > to be really hard. Can we see some evidence that the results will be
> > worth it?
>
> BTW, this is hardly the first such attempt.  See [1] for a prior
> attempt at something fairly similar, which ended up going nowhere.
> It'd be wise to understand why that failed before pressing forward.
>
> Note that the thread title for [1] is pretty misleading, as the
> original discussion about JSONB-specific compression soon migrated
> to discussion of compressing TOAST data using dictionaries.  At
> least from a ten-thousand-foot viewpoint, that seems like exactly
> what you're proposing here.  I see that you dismissed [1] as
> irrelevant upthread, but I think you'd better look closer.
>
> regards, tom lane
>
> [1] 
> https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com

Thank you for highlighting the previous discussion—I reviewed [1]
closely. While both methods involve dictionary-based compression, the
approach I'm proposing differs significantly.

The previous method explicitly extracted string values from JSONB and
assigned unique OIDs to each entry, resulting in distinct dictionary
entries for every unique value. In contrast, this approach directly
leverages Zstandard's dictionary training API. We provide raw data
samples to Zstd, which generates a dictionary of a specified size.
This dictionary is then stored in a catalog table and used to compress
subsequent inserts for the specific attribute it was trained on.

Key differences include:

1. No new data types are required.
2. Attributes can optionally have multiple dictionaries; the latest
dictionary is used during compression, and the exact dictionary used
during compression is retrieved and applied for decompression.
3. Compression utilizes Zstandard's trained dictionaries when available.

Additionally, I have provided an option for users to define custom
sampling and training logic, as directly passing raw buffers to the
training API may not always yield optimal results, especially for
certain custom variable-length data types. This flexibility motivates
the necessary adjustments to `pg_type`.

I would greatly appreciate your feedback or any additional suggestions
you might have.

[1] 
https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com

Best regards,
Nikhil Veldanda




Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-03-06 Thread Nikhil Kumar Veldanda
Hi Yura,

> So, to support "super-fast" mode you have to accept negative compression
> levels. I didn't check, probably you're already support them?
>

The key point I want to emphasize is that both zstd compression levels
and dictionary size should be configurable based on user preferences
at attribute level.

---
Nikhil Veldanda




Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-04-21 Thread Nikhil Kumar Veldanda
Hi Michael,

Thanks for the feedback and the suggested patch sequence. I completely
agree—we must minimize storage overhead when dictionaries aren’t used,
while ensuring varattrib_4b remains extensible enough to handle future
compression metadata beyond dictionary ID (for other algorithms). I’ll
explore design options that satisfy both goals and share my proposal.

Best regards,
Nikhil Veldanda

On Mon, Apr 21, 2025 at 12:02 AM Michael Paquier  wrote:
>
> On Fri, Apr 18, 2025 at 12:22:18PM -0400, Robert Haas wrote:
> > I think we could add plain-old zstd compression without really
> > tackling this issue, but if we are going to add dictionaries then I
> > think we might need to revisit the idea of preventing things from
> > leaking out of tables. What I can't quite remember at the moment is
> > how much of the problem was that it was going to be slow to force the
> > recompression, and how much of it was that we weren't sure we could
> > even find all the places in the code that might need such handling.
>
> FWIW, this point resonates here.  There is one thing that we have to
> do anyway: we just have one bit left in the varlena headers as lz4 is
> using the one before last.  So we have to make it extensible, even if
> it means that any compression method other than LZ4 and pglz would
> consume one more byte in its header by default.  And I think that this
> has to happen at some point if we want flexibility in this area.
>
> +struct
> +{
> +uint32va_header;
> +uint32va_tcinfo;
> +uint32va_cmp_alg;
> +uint32va_cmp_dictid;
> +charva_data[FLEXIBLE_ARRAY_MEMBER];
> +}va_compressed_ext;
>
> Speaking of which, I am confused by this abstraction choice in
> varatt.h in the first patch.  Are we sure that we are always going to
> have a dictionary attached to a compressed data set or even a
> va_cmp_alg?  It seems to me that this could lead to a waste of data in
> some cases because these fields may not be required depending on the
> compression method used, as some fields may not care about these
> details.  This kind of data should be made optional, on a per-field
> basis.
>
> One thing that I've been wondering is how it would be possible to make
> the area around varattrib_4b more readable while dealing with more
> extensibility.  It would be a good occasion to improve that, even if
> I'm hand-waving here currently and that the majority of this code is
> old enough to vote, with few modifications across the years.
>
> The second thing that I'd love to see on top of the addition of the
> extensibility is adding plain compression support for zstd, with
> nothing fancy, just the compression and decompression bits.  I've done
> quite a few benchmarks with the two, and results kind of point in the
> direction that zstd is more efficient than lz4 overall.  Don't take me
> wrong: lz4 can be better in some workloads as it can consume less CPU
> than zstd while compressing less.  However, a comparison of ratios
> like (compression rate / cpu used) has always led me to see zstd as
> superior in a large number of cases.  lz4 is still very good if you
> are CPU-bound and don't care about the extra space required.  Both are
> three classes better than pglz.
>
> Once we have these three points incrementally built-in together (the
> last bit extensibility, the potential varatt.h refactoring and the
> zstd support), there may be a point in having support for more
> advanced options with the compression methods in the shape of dicts or
> more requirements linked to other compression methods, but I think the
> topic is complex enough that we should make sure that these basics are
> implemented in a way sane enough so as we'd be able to extend them
> with all the use cases in mind.
> --
> Michael




Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-04-28 Thread Nikhil Kumar Veldanda
Hi Robert,

Thanks for raising that question. The idea behind including a 24-bit
length field alongside the 1-byte algorithm ID is to ensure that each
compressed datum self-describes its metadata size. This allows any
compression algorithm to embed variable-length metadata (up to 16 MB)
without the need for hard-coding header sizes. For instance, an
algorithm in feature might require different metadata lengths for each
datum, and a fixed header size table wouldn’t work. By storing the
length in the header, we maintain a generic and future-proof design. I
would greatly appreciate any feedback on this design. Thanks!

On Mon, Apr 28, 2025 at 7:50 AM Robert Haas  wrote:
>
> On Fri, Apr 25, 2025 at 11:15 AM Nikhil Kumar Veldanda
>  wrote:
> > a. 24 bits for length → per-datum compression algorithm metadata is
> > capped at 16 MB, which is far more than any realistic compression
> > header.
> > b. 8 bits for algorithm id → up to 256 algorithms.
> > c. Zero-overhead when unused if an algorithm needs no per-datum
> > metadata (e.g., ZSTD-nodict),
>
> I don't understand why we need to spend 24 bits on a length header
> here. I agree with the idea of adding a 1-byte quantity for algorithm
> here, but I don't see why we need anything more than that. If the
> compression method is zstd-with-a-dict, then the payload data
> presumably needs to start with the OID of the dictionary, but it seems
> like in your schema every single datum would use these 3 bytes to
> store the fact that sizeof(Oid) = 4. The code that interprets
> zstd-with-dict datums should already know the header length. Even if
> generic code that works with all types of compression needs to be able
> to obtain the header length on a per-compression-type basis, there can
> be some kind of callback or table for that, rather than storing it in
> every single datum.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com



-- 
Nikhil Veldanda




Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-05-04 Thread Nikhil Kumar Veldanda
N_ID] =   \
((val) != TOAST_PGLZ_COMPRESSION_ID &&  \
(val) != TOAST_LZ4_COMPRESSION_ID  &&   \
(val) != TOAST_INVALID_COMPRESSION_ID),
TOAST_COMPRESSION_LIST
#undef X
};

#define TOAST_CMPID_EXTENDED(alg) (toast_cmpid_extended[alg])

/*
 * Prototype for a per-datum metadata-size callback:
 *   given a pointer to the extended header, return
 *   how many metadata bytes follow it.
 */
typedef uint32 (*ToastMetaSizeFn) (const varatt_cmp_extended *hdr);

/* Callback table—indexed by ToastCompressionId */
static const ToastMetaSizeFn toast_meta_size_fns[] = {
#define X(name,val,fn) [TOAST_##name##_COMPRESSION_ID] = fn,
TOAST_COMPRESSION_LIST
#undef X
};

/* Calculates algorithm metadata size */
static inline uint32
toast_cmpid_meta_size(const varatt_cmp_extended *hdr)
{
Assert(hdr != NULL);
return toast_meta_size_fns[hdr->cmp_alg] (hdr);
}

Each compression algorithm provides a static callback that returns the
size of its metadata, given a pointer to the varatt_cmp_extended
header. Algorithms with fixed-size metadata return a constant, while
algorithms with variable-length metadata are responsible for defining
and parsing their own internal headers to compute the metadata size.

3. Resulting on-disk layouts for zstd

ZSTD (nodict) — datum on‑disk layout

+--+
| va_header (uint32)   |
+--+
| va_tcinfo (uint32)   |  ← top two bits = 11 (extended)
+--+
| cmp_alg  (uint8) |  ← (ZSTD_NODICT)
+--+
| compressed bytes …   |  ← ZSTD frame
+--+


ZSTD(dict) — datum on‑disk layout

+--+
| va_header (uint32)   |
+--+
| va_tcinfo (uint32)   |  ← top two bits = 11 (extended)
+--+
| cmp_alg  (uint8) |  ← (ZSTD_DICT)
+--+
| dict_id   (uint32)   |  ← dictionary OID
+--+
| compressed bytes …   |  ← ZSTD frame
+--+

I hope this updated design addresses your concerns. I would appreciate
any further feedback you may have. Thanks again for your guidance—it's
been very helpful.

v20-0001-varattrib_4b-design-proposal-to-make-it-extended.patch:
varattrib_4b extensibility – adds varatt_cmp_extended, metadata size
dispatch and useful macros; behaviour unchanged.
v20-0002-zstd-nodict-compression.patch: Plain ZSTD (non dict) support.


--
Nikhil Veldanda
From dc6172c6945354634063b03215dc6798ae22cc2f Mon Sep 17 00:00:00 2001
From: Nikhil Kumar Veldanda 
Date: Sat, 3 May 2025 02:14:02 +
Subject: [PATCH v20 2/2] zstd nodict compression

---
 contrib/amcheck/verify_heapam.c   |   1 +
 src/backend/access/common/detoast.c   |  12 +-
 src/backend/access/common/reloptions.c|  14 +-
 src/backend/access/common/toast_compression.c | 171 +-
 src/backend/access/common/toast_internals.c   |   4 +
 src/backend/utils/adt/varlena.c   |   3 +
 src/backend/utils/misc/guc_tables.c   |   3 +
 src/backend/utils/misc/postgresql.conf.sample |   2 +-
 src/bin/psql/describe.c   |   5 +-
 src/bin/psql/tab-complete.in.c|   4 +-
 src/include/access/toast_compression.h|  36 +++-
 src/include/access/toast_internals.h  |   3 +-
 src/include/utils/attoptcache.h   |   1 +
 src/test/regress/expected/compression.out |   5 +-
 src/test/regress/expected/compression_1.out   |   3 +
 .../expected/compression_zstd_nodict.out  | 152 
 .../expected/compression_zstd_nodict_1.out| 103 +++
 src/test/regress/parallel_schedule|   2 +-
 src/test/regress/sql/compression.sql  |   1 +
 .../regress/sql/compression_zstd_nodict.sql   |  82 +
 20 files changed, 581 insertions(+), 26 deletions(-)
 create mode 100644 src/test/regress/expected/compression_zstd_nodict.out
 create mode 100644 src/test/regress/expected/compression_zstd_nodict_1.out
 create mode 100644 src/test/regress/sql/compression_zstd_nodict.sql

diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index d7c2ac6951a..111bb308341 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -1792,6 +1792,7 @@ check_tuple_attribute(HeapCheckContext *ctx)
 /* List of all valid compression method IDs */
 			case TOAST_PGLZ_COMPRESSION_ID:
 			case TOAST_LZ4_COMPRESSION_ID:
+			case TOAST_ZSTD_NODICT_COMPRESSION_ID:
 valid = true;
 break;
 
diff --git a/src/backend/access/common/detoast.c b/src/backend/access/common/detoast.c
index 01419d1c65f..451230023ec 100644
--- a/src/back

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-04-25 Thread Nikhil Kumar Veldanda
Hi Michael,

Thanks for the suggestions. I agree that we should first solve the
“last–free-bit” problem in varattrib_4b compression bits before
layering on any features. Below is the approach I’ve prototyped to
keep the header compact yet fully extensible, followed by a sketch of
the plain-ZSTD(no dict) patch that sits cleanly on top of it.

1. Minimal but extensible header

/* varatt_cmp_extended follows va_tcinfo when the upper two bits of
 * va_tcinfo are 11.  Compressed data starts immediately after
 * ext_data.  ext_hdr encodes both the compression algorithm and the
 * byte-length of the algorithm-specific metadata.
 */
typedef struct varatt_cmp_extended
{
uint32 ext_hdr; /* [ meta_size:24 | cmpr_id:8 ] */
char   ext_data[FLEXIBLE_ARRAY_MEMBER];  /* optional metadata */
} varatt_cmp_extended;

a. 24 bits for length → per-datum compression algorithm metadata is
capped at 16 MB, which is far more than any realistic compression
header.
b. 8 bits for algorithm id → up to 256 algorithms.
c. Zero-overhead when unused if an algorithm needs no per-datum
metadata (e.g., ZSTD-nodict),

2. Algorithm registry
/*
 * TOAST compression methods enumeration.
 *
 * Each entry defines:
 *   - NAME : identifier for the compression algorithm
 *   - VALUE: numeric enum value
 *   - METADATA type: struct type holding extra info (void when none)
 *
 * The INVALID entry is a sentinel and must remain last.
 */
#define TOAST_COMPRESSION_LIST  \
X(PGLZ, 0, void) /* existing */ \
X(LZ4,  1, void) /* existing */ \
X(ZSTD_NODICT,  2, void) /* new, no metadata */ \
X(ZSTD_DICT,3, zstd_dict_meta)   /* new, needs dict_id */   \
X(INVALID,  4, void) /* sentinel */

typedef enum ToastCompressionId
{
#define X(name,val,meta) TOAST_##name##_COMPRESSION_ID = val,
TOAST_COMPRESSION_LIST
#undef X
} ToastCompressionId;

/* Example of an algorithm-specific metadata block */
typedef struct
{
uint32 dict_id; /* dictionary Oid */
} zstd_dict_meta;

3. Resulting on-disk layouts for zstd

ZSTD no dict: datum ondisk layout:
+--+
| va_header (uint32)   |
+--+
| va_tcinfo (uint32)  |  (11 in top two bits specify extended)
+--+
| ext_hdr (uint32)|  <-- [ meta size:24 bits |
compression id:8 bits ]
+--+
| Compressed bytes …   |  <-- zstd   (no dictionary)
+--+

ZSTD dict: datum ondisk layout
+--+
| va_header (uint32)   |
+--+
| va_tcinfo (uint32)  |
+--+
| ext_hdr (uint32)|  <-- [ meta size:24 bits |
compression id:8 bits ]
+--+
| dict_id (uint32)  |  <-- zstd_dict_meta
+--+
| Compressed bytes …   |  <-- zstd   (dictionary)
+--+

4. How does this fit?

Flexibility: Each new algorithm that needs extra metadata simply
defines its own struct and allocates varatt_cmp_extended in
setup_compression_info.
Storage: Everything in varatt_cmp_extended is copied to the datum,
immediately followed by the compressed payload.
Optional, pay-as-you-go metadata – only algorithms that need it pay for it.
Future-proof – new compression algorithms, requires any kind of
metadata like dictid or any other slot into the same ext_data
mechanism.

I’ve split the work into two patches for review:
v19-0001-varattrib_4b-design-proposal-to-make-it-extended.patch:
varattrib_4b extensibility – adds varatt_cmp_extended, enum plumbing,
and macros; behaviour unchanged.
v19-0002-zstd-nodict-support.patch: Plain ZSTD (non dict) support.

Please share your thoughts—and I’d love to hear feedback on the design. Thanks!

On Mon, Apr 21, 2025 at 12:02 AM Michael Paquier  wrote:
>
> On Fri, Apr 18, 2025 at 12:22:18PM -0400, Robert Haas wrote:
> > I think we could add plain-old zstd compression without really
> > tackling this issue, but if we are going to add dictionaries then I
> > think we might need to revisit the idea of preventing things from
> > leaking out of tables. What I can't quite remember at the moment is
> > how much of the problem was that it was going to be slow to force the
> > recompression, and how much of it was that we weren't sure we could
> > even find all the places in the code that might need such handling.
>
> FWIW, this point resonates here.  There is one thing that we have to
> do anyway: we just have one bit left in the varlena headers as lz4 is
> using the one before last.  So we have to make it extensible, even if
> it means that any compression method other than LZ4 and pglz would
> consume one mor

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-04-21 Thread Nikhil Kumar Veldanda
Hi Robert,

Thank you for your feedback on the patch. You’re right that my
proposed design will introduce more dictionary dependencies as
dictionaries grow, I chose this path specifically to avoid changing
existing system behavior and prevent perf regressions in CTAS and
related commands.

After reviewing the email thread you attached on previous response, I
identified a natural choke point for both inserts and updates: the
call to "heap_toast_insert_or_update" inside
heap_prepare_insert/heap_update. In the current master branch, that
function only runs when HeapTupleHasExternal is true; my patch extends
it to HeapTupleHasVarWidth tuples as well. By decompressing every
nested compressed datum at this point—no matter how deeply nested—we
can prevent any leaked datum from propagating into unrelated tables.
This mirrors the existing inlining logic in toast_tuple_init for
external toasted datum, but takes it one step further to fully flatten
datum(decompress datum, not just top level at every level).

On the performance side, my basic benchmarks show almost no regression
for simple INSERT … VALUES workloads. CTAS, however, does regress
noticeably: a CTAS completes in about 4 seconds before this patch, but
with this patch it takes roughly 24 seconds. (For reference, a normal
insert into the source table took about 58 seconds when using zstd
dictionary compression), I suspect the extra cost comes from the added
zstd decompression and PGLZ compression on the destination table.

I’ve attached v13-0008-initial-draft-to-address-datum-leak-problem.patch,
which implements this “flatten_datum” method.

I’d love to know your thoughts on this. Am I on the right track for
solving the problem?

Best regards,
Nikhil Veldanda

On Fri, Apr 18, 2025 at 9:22 AM Robert Haas  wrote:
>
> On Tue, Apr 15, 2025 at 2:13 PM Nikhil Kumar Veldanda
>  wrote:
> > Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ... SELECT 
> > ...)
> >
> > As compressed datums can be copied to other unrelated tables via CTAS,
> > INSERT INTO ... SELECT, or CREATE TABLE ... EXECUTE, I’ve introduced a
> > method inheritZstdDictionaryDependencies. This method is invoked at
> > the end of such statements and ensures that any dictionary
> > dependencies from source tables are copied to the destination table.
> > We determine the set of source tables using the relationOids field in
> > PlannedStmt.
>
> With the disclaimer that I haven't opened the patch or thought
> terribly deeply about this issue, at least not yet, my fairly strong
> suspicion is that this design is not going to work out, for multiple
> reasons. In no particular order:
>
> 1. I don't think users will like it if dependencies on a zstd
> dictionary spread like kudzu across all of their tables. I don't think
> they'd like it even if it were 100% accurate, but presumably this is
> going to add dependencies any time there MIGHT be a real dependency
> rather than only when there actually is one.
>
> 2. Inserting into a table or updating it only takes RowExclusiveLock,
> which is not even self-exclusive. I doubt that it's possible to change
> system catalogs in a concurrency-safe way with such a weak lock. For
> instance, if two sessions tried to do the same thing in concurrent
> transactions, they could both try to add the same dependency at the
> same time.
>
> 3. I'm not sure that CTAS, INSERT INTO...SELECT, and CREATE
> TABLE...EXECUTE are the only ways that datums can creep from one table
> into another. For example, what if I create a plpgsql function that
> gets a value from one table and stores it in a variable, and then use
> that variable to drive an INSERT into another table? I seem to recall
> there are complex cases involving records and range types and arrays,
> too, where the compressed object gets wrapped inside of another
> object; though maybe that wouldn't matter to your implementation if
> INSERT INTO ... SELECT uses a sufficiently aggressive strategy for
> adding dependencies.
>
> When Dilip and I were working on lz4 TOAST compression, my first
> instinct was to not let LZ4-compressed datums leak out of a table by
> forcing them to be decompressed (and then possibly recompressed). We
> spent a long time trying to make that work before giving up. I think
> this is approximately where things started to unravel, and I'd suggest
> you read both this message and some of the discussion before and
> after:
>
> https://www.postgresql.org/message-id/20210316185455.5gp3c5zvvvq66...@alap3.anarazel.de
>
> I think we could add plain-old zstd compression without really
> tackling this issue, but if we are going to add dictionaries then I
> think we might need to revisit the idea of preventing things from
> leaking out of tab

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-05-07 Thread Nikhil Kumar Veldanda
Hi Robert,

On Mon, May 5, 2025 at 8:07 AM Robert Haas  wrote:
> I don't understand why we need this. I don't see why we need any sort
> of generalized concept of metadata at all here. The zstd-dict
> compression method needs to store a four-byte OID, so let it do that.
> But we don't need to brand that as metadata; and we don't need a
> method for other parts of the system to ask how much metadata exists.
> At least, I don't think we do.
>

Thank you for the feedback. My intention in introducing the
toast_cmpid_meta_size helper to centralize header-size computation
across all compression algorithms and to provide generic macros that
can be applied to any extended algorithm methods.

I agree that algorithm-specific metadata details or its sizes need not
be exposed beyond their own routines. Each compression method
inherently knows its layout requirements and should handle them
internally in their routines. I’ve removed the toast_cmpid_meta_size
helper and eliminated the metadata branding.

In the varatt_cmp_extended, the cmp_data field carries the algorithm
payload: for zstd-nodict, it’s a ZSTD frame; for zstd-dict, it’s a
four-byte dictionary OID followed by the ZSTD frame. This approach
ensures the algorithm's framing is fully self-contained in its
routines.

/*
 * varatt_cmp_extended: an optional per-datum header for extended
compression method.
 * Only used when va_tcinfo’s top two bits are “11”.
 */
typedef struct varatt_cmp_extended
{
uint8 cmp_alg;
char  cmp_data[FLEXIBLE_ARRAY_MEMBER];
} varatt_cmp_extended;

I’ve updated patch v21, please review it and let me know if you have
any questions or feedback? Thank you!

v21-0001-varattrib_4b-design-proposal-to-make-it-extended.patch:
varattrib_4b extensibility – adds varatt_cmp_extended, useful macros;
behaviour unchanged.
v21-0002-zstd-nodict-compression.patch: Plain ZSTD (non dict) support
and few basic tests.


--
Nikhil Veldanda


v21-0002-zstd-nodict-compression.patch
Description: Binary data


v21-0001-varattrib_4b-design-proposal-to-make-it-extended.patch
Description: Binary data


Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-05-07 Thread Nikhil Kumar Veldanda
Hi Michael, Thanks for the feedback.

On Wed, May 7, 2025 at 12:49 AM Michael Paquier  wrote:
>
> I have been reading 0001 and I'm finding that the integration does not
> seem to fit much with the existing varatt_external, making the whole
> result slightly confusing.  A simple thing: the last bit that we can
> use is in varatt_external's va_extinfo, where the patch is using
> VARATT_4BCE_MASK to track that we need to go beyond varatt_external to
> know what kind of compression information we should use.  This is an
> important point, and it is not documented around varatt_external which
> still assumes that the last bit could be used for a compression
> method.  With what you are doing in 0001 (or even 0002), this becomes
> wrong.

This is the current logic used in patch for varatt_external.

When a datum is compressed with an extended algorithm and must live in
external storage, we set the top two bits of
va_extinfo(varatt_external) to 0b11.

To figure out the compression method for an external TOAST datum:

1. Inspect the top two bits of va_extinfo.
2. If they equal 0b11(VARATT_4BCE_MASK), call
toast_get_compression_id, which invokes detoast_external_attr to fetch
the datum in its 4-byte varattrib form (no decompression) and then
reads its compression header to find the compression method.
3. Otherwise, fall back to the existing
VARATT_EXTERNAL_GET_COMPRESS_METHOD path to get the compression
method.

We use this macro VARATT_EXTERNAL_COMPRESS_METHOD_EXTENDED to
determine if the compression method is extended or not.

Across the entire codebase, external TOAST‐pointer compression methods
are only inspected in the following functions:
1. pg_column_compression
2. check_tuple_attribute (verify_heapam pg function)
3. detoast_attr_slice (just to check pglz or not)

Could you please help me understand what’s incorrect about this approach?

> Shouldn't we have a new struct portion in varattrib_4b's union for
> this purpose at least (I don't recall that we rely on varattrib_4b's
> size which would get larger with this extra byte for the new extended
> data with the three bits set for the compression are set in
> va_extinfo, correct me if I'm wrong here).
> --

In patch v21, va_compressed.va_data points to varatt_cmp_extended, so
adding it isn’t strictly necessary. If we do want to fold it into the
varattrib_4b union, we could define it like this:

```
typedef union
{
struct /* Normal varlena (4-byte length) */
{
uint32 va_header;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_4byte;
struct /* Compressed-in-line format */
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size (excludes header) and
* compression method; see va_extinfo */
char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
} va_compressed;
struct
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size (excludes header) and
* compression method; see va_extinfo */
uint8 cmp_alg;
char cmp_data[FLEXIBLE_ARRAY_MEMBER];
} varatt_cmp_extended;
} varattrib_4b;
```
we don't depend on varattrib_4b size anywhere.

--
Nikhil Veldanda