Re: [HACKERS] Custom compression methods

Andres Freund Mon, 22 Jun 2020 13:54:24 -0700

Hi,

On 2020-06-19 13:03:02 -0400, Robert Haas wrote:
> - I can see three possible ways of breaking our dependence on 'pglz'
> for TOAST compression. Option #1 is to pick one new algorithm which we
> think is better than 'pglz' in all relevant ways and use it as the
> default for all new compressed datums. This would be dramatically
> simpler than what this patch does, because there would be no user
> interface. It would just be out with the old and in with the new.
> Option #2 is to create a short list of new algorithms that have
> different trade-offs; e.g. one that is very fast (like lz4) and one
> that has an extremely high compression ratio, and provide an interface
> for users to choose between them. This would be moderately simpler
> than what this patch does, because we would expose to the user
> anything about how a new compression method could be added, but it
> would still require a UI for the user to choose between the available
> (and hard-coded) options. It has the further advantage that every
> PostgreSQL cluster will offer the same options (or a subset of them,
> perhaps, depending on configure flags) and so you don't have to worry
> that, say, a pg_am row gets lost and suddenly all of your toasted data
> is inaccessible and uninterpretable. Option #3 is to do what this
> patch actually does, which is to allow for the addition of any number
> of compressors, including by extensions. It has the advantage that new
> compressors can be added with core's permission, so, for example, if
> it is unclear whether some excellent compressor is free of patent
> problems, we can elect not to ship support for it in core, while at
> the same time people who are willing to accept the associated legal
> risk can add that functionality to their own copy as an extension
> without having to patch core. The legal climate may even vary by
> jurisdiction, so what might be questionable in country A might be
> clearly just fine in country B. Aside from those issues, this approach
> allows people to experiment and innovate outside of core relatively
> quickly, instead of being bound by the somewhat cumbrous development
> process which has left this patch in limbo for the last few years. My
> view is that option #1 is likely to be impractical, because getting
> people to agree is hard, and better things are likely to come along
> later, and people like options. So I prefer either #2 or #3.


I personally favor going for #2, at least initially. Then we can discuss
the runtime-extensibility of #3 separately.


> - The next question is how a datum compressed with some non-default
> method should be represented on disk. The patch handles this first of
> all by making the observation that the compressed size can't be >=1GB,
> because the uncompressed size can't be >=1GB, and we wouldn't have
> stored it compressed if it expanded. Therefore, the upper two bits of
> the compressed size should always be zero on disk, and the patch
> steals one of them to indicate whether "custom" compression is in use.
> If it is, the 4-byte varlena header is followed not only by a 4-byte
> size (now with the new flag bit also included) but also by a 4-byte
> OID, indicating the compression AM in use. I don't think this is a
> terrible approach, but I don't think it's amazing, either. 4 bytes is
> quite a bit to use for this; if I guess correctly what will be a
> typical cluster configuration, you probably would really only need
> about 2 bits. For a datum that is both stored externally and
> compressed, the overhead is likely negligible, because the length is
> probably measured in kB or MB. But for a datum that is compressed but
> not stored externally, it seems pretty expensive; the datum is
> probably short, and having an extra 4 bytes of uncompressible data
> kinda sucks. One possibility would be to allow only one byte here:
> require each compression AM that is installed to advertise a one-byte
> value that will denote its compressed datums. If more than one AM
> tries to claim the same byte value, complain. Another possibility is
> to abandon this approach and go with #2 from the previous paragraph.
> Or maybe we add 1 or 2 "privileged" built-in compressors that get
> dedicated bit-patterns in the upper 2 bits of the size field, with the
> last bit pattern being reserved for future algorithms. (e.g. 0x00 =
> pglz, 0x01 = lz4, 0x10 = zstd, 0x11 = something else - see within for
> details).

Agreed. I favor an approach roughly like I'd implemented below
https://postgr.es/m/20130605150144.GD28067%40alap2.anarazel.de

I.e. leave the vartag etc as-is, but utilize the fact that pglz
compressed datums starts with a 4 byte length header, and that due to
the 1GB limit, the first two bits currently have to be 0. That allows to
indicate 2 compression methods without any space overhead, and
additional compression methods are supported by using an additional byte
(or some variable length encoded larger amount) if both bits are 1.


> - Yet another possible approach to the on-disk format is to leave
> varatt_external.va_extsize and varattrib_4b.rawsize untouched and
> instead add new compression methods by adding new vartag_external
> values. There's quite a lot of bit-space available there: we have a
> whole byte, and we're currently only using 4 values. We could easily
> add a half-dozen new possibilities there for new compression methods
> without sweating the bit-space consumption. The main thing I don't
> like about this is that it only seems like a useful way to provide for
> out-of-line compression. Perhaps it could be generalized to allow for
> inline compression as well, but it seems like it would take some
> hacking.

One additional note: Adding additional vartag_external values does incur
some noticable cost, distributed across lots of places.


> - One thing I really don't like about the patch is that it consumes a
> bit from infomask2 for a new flag HEAP_HASCUSTOMCOMPRESSED. infomask
> bits are at a premium, and there's been no real progress in the decade
> plus that I've been hanging around here in clawing back any bit-space.
> I think we really need to avoid burning our remaining bits for
> anything other than a really critical need, and I don't think I
> understand what the need is in this case. I might be missing
> something, but I'd really strongly suggest looking for a way to get
> rid of this. It also invents the concept of a TupleDesc flag, and the
> flag invented is TD_ATTR_CUSTOM_COMPRESSED; I'm not sure I see why we
> need that, either.

+many


Small note: The current patch adds #include "postgres.h" to a few
headers - it shouldn't do so.

Greetings,

Andres Freund

Re: [HACKERS] Custom compression methods

Reply via email to