On Fri, Apr 22, 2022 at 1:30 AM Aleksander Alekseev < aleksan...@timescale.com> wrote:
> Hi hackers, > > This is a follow-up thread to `RFC: compression dictionaries for JSONB` > [1]. I would like to share my current progress in order to get early > feedback. The patch is currently in a draft state but implements the basic > functionality. I did my best to account for all the great feedback I > previously got from Alvaro and Matthias. > > Usage example: > > ``` > CREATE TYPE mydict AS DICTIONARY OF jsonb ('aaa', 'bbb'); > > SELECT '{"aaa":"bbb"}' :: mydict; > mydict > ---------------- > {"aaa": "bbb"} > > SELECT ('{"aaa":"bbb"}' :: mydict) -> 'aaa'; > ?column? > ---------- > "bbb" > ``` > > Here `mydict` works as a transparent replacement for `jsonb`. However, its > internal representation differs. The provided dictionary entries ('aaa', > 'bbb') are stored in the new catalog table: > > ``` > SELECT * FROM pg_dict; > oid | dicttypid | dictentry > -------+-----------+----------- > 39476 | 39475 | aaa > 39477 | 39475 | bbb > (2 rows) > ``` > > When `mydict` sees 'aaa' in the document, it replaces it with the > corresponding code, in this case - 39476. For more details regarding the > compression algorithm and choosen compromises please see the comments in > the patch. > > In pg_type `mydict` has typtype = TYPTYPE_DICT. It works the same way as > TYPTYPE_BASE with only difference: corresponding `<type>_in` > (pg_type.typinput) and `<another-type>_<type>` (pg_cast.castfunc) > procedures receive the dictionary Oid as a `typmod` argument. This way the > procedures can distinguish `mydict1` from `mydict2` and use the proper > compression dictionary. > > The approach with alternative `typmod` role is arguably a bit hacky, but > it was the less invasive way to implement the feature I've found. I'm open > to alternative suggestions. > > Current limitations (todo): > - ALTER TYPE is not implemented > - Tests and documentation are missing > - Autocomplete is missing > > Future work (out of scope of this patch): > - Support types other than JSONB: TEXT, XML, etc > - Automatically updated dictionaries, e.g. during VACUUM > - Alternative compression algorithms. Note that this will not require any > further changes in the catalog, only the values we write to pg_type and > pg_cast will differ. > > Open questions: > - Dictionary entries are currently stored as NameData, the same type that > is used for enums. Are we OK with the accompanying limitations? Any > alternative suggestions? > - All in all, am I moving the right direction? > > Your feedback is very much welcomed! > > [1]: > https://postgr.es/m/CAJ7c6TPx7N-bVw0dZ1ASCDQKZJHhBYkT6w4HV1LzfS%2BUUTUfmA%40mail.gmail.com > > -- > Best regards, > Aleksander Alekseev > Hi, For src/backend/catalog/pg_dict.c, please add license header. + elog(ERROR, "skipbytes > decoded_size - outoffset"); Include the values for skipbytes, decoded_size and outoffset. Cheers