On Sun, May 04, 2025 at 05:54:34AM -0700, Nikhil Kumar Veldanda wrote:
> 3. Resulting on-disk layouts for zstd
> 
> ZSTD (nodict) — datum on‑disk layout
> 
> +----------------------------------+
> | va_header (uint32)               |
> +----------------------------------+
> | va_tcinfo (uint32)               |  ← top two bits = 11 (extended)
> +----------------------------------+
> | cmp_alg  (uint8)                 |  ← (ZSTD_NODICT)
> +----------------------------------+
> | compressed bytes …               |  ← ZSTD frame
> +----------------------------------+

This makes sense, yes.  You are allocating an extra byte after
va_tcinfo that serves as a redirection if the three bits dedicated to
the compression method are set.

> ZSTD(dict) — datum on‑disk layout
> +----------------------------------+
> | va_header (uint32)               |
> +----------------------------------+
> | va_tcinfo (uint32)               |  ← top two bits = 11 (extended)
> +----------------------------------+
> | cmp_alg  (uint8)                 |  ← (ZSTD_DICT)
> +----------------------------------+
> | dict_id   (uint32)               |  ← dictionary OID
> +----------------------------------+
> | compressed bytes …               |  ← ZSTD frame
> +----------------------------------+
> 
> I hope this updated design addresses your concerns. I would appreciate
> any further feedback you may have. Thanks again for your guidance—it's
> been very helpful.

That makes sense as well structurally if we include a dictionary for
each value.  Not sure that we need that much space, for this purpose,
though.  We are going to need the extra byte anyway AFAIK, so better
to start with that.

I have been reading 0001 and I'm finding that the integration does not
seem to fit much with the existing varatt_external, making the whole
result slightly confusing.  A simple thing: the last bit that we can
use is in varatt_external's va_extinfo, where the patch is using
VARATT_4BCE_MASK to track that we need to go beyond varatt_external to
know what kind of compression information we should use.  This is an
important point, and it is not documented around varatt_external which
still assumes that the last bit could be used for a compression
method.  With what you are doing in 0001 (or even 0002), this becomes
wrong.

Shouldn't we have a new struct portion in varattrib_4b's union for
this purpose at least (I don't recall that we rely on varattrib_4b's
size which would get larger with this extra byte for the new extended
data with the three bits set for the compression are set in
va_extinfo, correct me if I'm wrong here).
--
Michael

Attachment: signature.asc
Description: PGP signature

Reply via email to