Hi hackers, > So far we seem to have a consensus to: > > 1. Use bytea instead of NameData to store dictionary entries; > > 2. Assign monotonically ascending IDs to the entries instead of using > Oids, as it is done with pg_class.relnatts. In order to do this we > should either add a corresponding column to pg_type, or add a new > catalog table, e.g. pg_dict_meta. Personally I don't have a strong > opinion on what is better. Thoughts? > > Both changes should be straightforward to implement and also are a > good exercise to newcomers. > > I invite anyone interested to join this effort as a co-author! (since, > honestly, rewriting the same feature over and over again alone is > quite boring :D).
cfbot complained that v5 doesn't apply anymore. Here is the rebased version of the patch. > Good point. This was not a problem for ZSON since the dictionary size > was limited to 2**16 entries, the dictionary was immutable, and the > dictionaries had versions. For compression dictionaries we removed the > 2**16 entries limit and also decided to get rid of versions. The idea > was that you can simply continue adding new entries, but no one > thought about the fact that this will consume the memory required to > decompress the document indefinitely. > > Maybe we should return to the idea of limited dictionary size and > versions. Objections? > [ ...] > You are right. Another reason to return to the idea of dictionary versions. Since no one objected so far and/or proposed a better idea I assume this can be added to the list of TODOs as well. -- Best regards, Aleksander Alekseev
v6-0001-Compression-dictionaries-for-JSONB.patch
Description: Binary data