Matthias, Nikita, Many thanks for the feedback!
> Any type with typlen < 0 should work, right? Right. > The use of dictionaries should be dependent on only the use of a > compression method that supports pre-computed compression > dictionaries. I think storage=MAIN + compression dictionaries should > be supported, to make sure there is no expensive TOAST lookup for the > attributes of the tuple; but that doesn't seem to be an option with > that design. > I don't think it's a good idea to interfere with the storage strategies. > Dictionary > should be a kind of storage option, like a compression, but not the strategy > declining all others. My reasoning behind this proposal was as follows. Let's not forget that MAIN attributes *can* be stored in a TOAST table as a final resort, and also that EXTENDED attributes are compressed in-place first, and are stored in a TOAST table *only* if this is needed to fit a tuple in toast_tuple_target bytes (which additionally user can change). So whether in practice it's going to be advantageous to distinguish MAIN+dict.compressed and EXTENDED+dict.compressed attributes seems to be debatable. Basically the only difference between MAIN and EXTENDED is the priority the four-stage TOASTing algorithm gives to the corresponding attributes. I would assume if the user wants dictionary compression, the attribute should be highly compressible and thus always EXTENDED. (We seem to use MAIN for types that are not that well compressible.) This being said, if the majority believes we should introduce a new entity and keep storage strategies as is, I'm fine with that. This perhaps is not going to be the most convenient interface for the user. On the flip side it's going to be flexible. It's all about compromise. > I think "AT_AC SET COMPRESSION lz4 {[WITH | WITHOUT] DICTIONARY}", > "AT_AC SET COMPRESSION lz4-dictionary", or "AT_AC SET > compression_dictionary = on" would be better from a design > perspective. > Agree with Matthias on above. OK, unless someone will object, we have a consensus here. > Didn't we get zstd support recently as well? Unfortunately, it is not used for TOAST. In fact I vaguely recall that ZSTD support for TOAST may have been explicitly rejected. Don't quote me on that however... I think it's going to be awkward to support PGLZ/LZ4 for COMPRESSION and LZ4/ZSTD for dictionary compression. As a user personally I would prefer having one set of compression algorithms that can be used with TOAST. Perhaps for PoC we could focus on LZ4, and maybe PGLZ, if we choose to use PGLZ for compression dictionaries too. We can always discuss ZSTD separately. > Can we specify a default compression method for each postgresql type, > just like how we specify the default storage? If not, then the setting > could realistically be in conflict with a default_toast_compression > setting, assuming that dictionary support is not a requirement for > column compression methods. No, only STORAGE can be specified [1]. > The toast pointer must store enough info about the compression used to > decompress the datum, which implies it needs to store the compression > algorithm used, and a reference to the compression dictionary (if > any). I think the idea about introducing a new toast pointer type (in > the custom toast patch) wasn't bad per se, and that change would allow > us to carry more or different info in the header. > The Pluggable TOAST was rejected, but we have a lot of improvements > based on changing the TOAST pointer structure. Interestingly it looks like we ended up working on TOAST improvement after all. I'm almost certain that we will have to modify TOAST pointers to a certain degree in order to make it work. Hopefully it's not going to be too invasive. [1]: https://www.postgresql.org/docs/current/sql-createtype.html -- Best regards, Aleksander Alekseev