Hi Nikhil, > Thank you for highlighting the previous discussion—I reviewed [1] > closely. While both methods involve dictionary-based compression, the > approach I'm proposing differs significantly. > > The previous method explicitly extracted string values from JSONB and > assigned unique OIDs to each entry, resulting in distinct dictionary > entries for every unique value. In contrast, this approach directly > leverages Zstandard's dictionary training API. We provide raw data > samples to Zstd, which generates a dictionary of a specified size. > This dictionary is then stored in a catalog table and used to compress > subsequent inserts for the specific attribute it was trained on. > > [...]
You didn't read closely enough I'm afraid. As Tom pointed out, the title of the thread is misleading. On top of that there are several separate threads. I did my best to cross-reference them, but apparently didn't do good enough. Initially I proposed to add ZSON extension [1][2] to the PostgreSQL core. However the idea evolved into TOAST improvements that don't require a user to use special types. You may also find interesting the related "Pluggable TOASTer" discussion [3]. The idea there was rather different but the discussion about extending TOAST pointers so that in the future we can use something else than ZSTD is relevant. You will find the recent summary of the reached agreements somewhere around this message [4], take a look at the thread a bit above and below it. I believe this effort is important. You can't, however, simply discard everything that was discussed in this area for the past several years. If you want to succeed of course. No one will look at your patch if it doesn't account for all the previous discussions. I'm sorry, I know it's disappointing. This being said you should have done better research before submitting the code. You could just ask if anyone was working on something like this before and save a lot of time. Personally I would suggest starting with one little step toward compression dictionaries. Particularly focusing on extendability of TOAST pointers. You are going to need to store dictionary ids there and allow using other compression algorithms in the future. This will require something like a varint/utf8-like bitmask for this. See the previous discussions. [1]: https://github.com/afiskon/zson [2]: https://postgr.es/m/CAJ7c6TP3fCC9TNKJBQAcEf4c%3DL7XQZ7QvuUayLgjhNQMD_5M_A%40mail.gmail.com [3]: https://postgr.es/m/224711f9-83b7-a307-b17f-4457ab73aa0a%40sigaev.ru [4]: https://postgr.es/m/CAJ7c6TPSN06C%2B5cYSkyLkQbwN1C%2BpUNGmx%2BVoGCA-SPLCszC8w%40mail.gmail.com -- Best regards, Aleksander Alekseev