Hi, On 2020-06-19 13:03:02 -0400, Robert Haas wrote: > - I can see three possible ways of breaking our dependence on 'pglz' > for TOAST compression. Option #1 is to pick one new algorithm which we > think is better than 'pglz' in all relevant ways and use it as the > default for all new compressed datums. This would be dramatically > simpler than what this patch does, because there would be no user > interface. It would just be out with the old and in with the new. > Option #2 is to create a short list of new algorithms that have > different trade-offs; e.g. one that is very fast (like lz4) and one > that has an extremely high compression ratio, and provide an interface > for users to choose between them. This would be moderately simpler > than what this patch does, because we would expose to the user > anything about how a new compression method could be added, but it > would still require a UI for the user to choose between the available > (and hard-coded) options. It has the further advantage that every > PostgreSQL cluster will offer the same options (or a subset of them, > perhaps, depending on configure flags) and so you don't have to worry > that, say, a pg_am row gets lost and suddenly all of your toasted data > is inaccessible and uninterpretable. Option #3 is to do what this > patch actually does, which is to allow for the addition of any number > of compressors, including by extensions. It has the advantage that new > compressors can be added with core's permission, so, for example, if > it is unclear whether some excellent compressor is free of patent > problems, we can elect not to ship support for it in core, while at > the same time people who are willing to accept the associated legal > risk can add that functionality to their own copy as an extension > without having to patch core. The legal climate may even vary by > jurisdiction, so what might be questionable in country A might be > clearly just fine in country B. Aside from those issues, this approach > allows people to experiment and innovate outside of core relatively > quickly, instead of being bound by the somewhat cumbrous development > process which has left this patch in limbo for the last few years. My > view is that option #1 is likely to be impractical, because getting > people to agree is hard, and better things are likely to come along > later, and people like options. So I prefer either #2 or #3.
I personally favor going for #2, at least initially. Then we can discuss the runtime-extensibility of #3 separately. > - The next question is how a datum compressed with some non-default > method should be represented on disk. The patch handles this first of > all by making the observation that the compressed size can't be >=1GB, > because the uncompressed size can't be >=1GB, and we wouldn't have > stored it compressed if it expanded. Therefore, the upper two bits of > the compressed size should always be zero on disk, and the patch > steals one of them to indicate whether "custom" compression is in use. > If it is, the 4-byte varlena header is followed not only by a 4-byte > size (now with the new flag bit also included) but also by a 4-byte > OID, indicating the compression AM in use. I don't think this is a > terrible approach, but I don't think it's amazing, either. 4 bytes is > quite a bit to use for this; if I guess correctly what will be a > typical cluster configuration, you probably would really only need > about 2 bits. For a datum that is both stored externally and > compressed, the overhead is likely negligible, because the length is > probably measured in kB or MB. But for a datum that is compressed but > not stored externally, it seems pretty expensive; the datum is > probably short, and having an extra 4 bytes of uncompressible data > kinda sucks. One possibility would be to allow only one byte here: > require each compression AM that is installed to advertise a one-byte > value that will denote its compressed datums. If more than one AM > tries to claim the same byte value, complain. Another possibility is > to abandon this approach and go with #2 from the previous paragraph. > Or maybe we add 1 or 2 "privileged" built-in compressors that get > dedicated bit-patterns in the upper 2 bits of the size field, with the > last bit pattern being reserved for future algorithms. (e.g. 0x00 = > pglz, 0x01 = lz4, 0x10 = zstd, 0x11 = something else - see within for > details). Agreed. I favor an approach roughly like I'd implemented below https://postgr.es/m/20130605150144.GD28067%40alap2.anarazel.de I.e. leave the vartag etc as-is, but utilize the fact that pglz compressed datums starts with a 4 byte length header, and that due to the 1GB limit, the first two bits currently have to be 0. That allows to indicate 2 compression methods without any space overhead, and additional compression methods are supported by using an additional byte (or some variable length encoded larger amount) if both bits are 1. > - Yet another possible approach to the on-disk format is to leave > varatt_external.va_extsize and varattrib_4b.rawsize untouched and > instead add new compression methods by adding new vartag_external > values. There's quite a lot of bit-space available there: we have a > whole byte, and we're currently only using 4 values. We could easily > add a half-dozen new possibilities there for new compression methods > without sweating the bit-space consumption. The main thing I don't > like about this is that it only seems like a useful way to provide for > out-of-line compression. Perhaps it could be generalized to allow for > inline compression as well, but it seems like it would take some > hacking. One additional note: Adding additional vartag_external values does incur some noticable cost, distributed across lots of places. > - One thing I really don't like about the patch is that it consumes a > bit from infomask2 for a new flag HEAP_HASCUSTOMCOMPRESSED. infomask > bits are at a premium, and there's been no real progress in the decade > plus that I've been hanging around here in clawing back any bit-space. > I think we really need to avoid burning our remaining bits for > anything other than a really critical need, and I don't think I > understand what the need is in this case. I might be missing > something, but I'd really strongly suggest looking for a way to get > rid of this. It also invents the concept of a TupleDesc flag, and the > flag invented is TD_ATTR_CUSTOM_COMPRESSED; I'm not sure I see why we > need that, either. +many Small note: The current patch adds #include "postgres.h" to a few headers - it shouldn't do so. Greetings, Andres Freund