Re: Add ZSON extension to /contrib/

Andrew Dunstan Tue, 25 May 2021 13:08:48 -0700


On 5/25/21 6:55 AM, Aleksander Alekseev wrote:
> Hi hackers,
>
> Back in 2016 while being at PostgresPro I developed the ZSON extension
> [1]. The extension introduces the new ZSON type, which is 100%
> compatible with JSONB but uses a shared dictionary of strings most
> frequently used in given JSONB documents for compression. These
> strings are replaced with integer IDs. Afterward, PGLZ (and now LZ4)
> applies if the document is large enough by common PostgreSQL logic.
> Under certain conditions (many large documents), this saves disk
> space, memory and increases the overall performance. More details can
> be found in README on GitHub.
>
> The extension was accepted warmly and instantaneously I got several
> requests to submit it to /contrib/ so people using Amazon RDS and
> similar services could enjoy it too. Back then I was not sure if the
> extension is mature enough and if it lacks any additional features
> required to solve the real-world problems of the users. Time showed,
> however, that people are happy with the extension as it is. There were
> several minor issues discovered, but they were fixed back in 2017. The
> extension never experienced any compatibility problems with the next
> major release of PostgreSQL.
>
> So my question is if the community may consider adding ZSON to
> /contrib/. If this is the case I will add this thread to the nearest
> CF and submit a corresponding patch.
>
> [1]: https://github.com/postgrespro/zson
> <https://github.com/postgrespro/zson>
>
We (2ndQuadrant, now part of EDB) made some enhancements to Zson a few years 
ago, and I have permission to contribute those if this proposal is adopted. 
From the readme:


1. There is an option to make zson_learn only process object keys,
rather than field values.

```
select zson_learn('{{table1,col1}}',true);
```

2. Strings with an octet-length less than 3 are not processed.
Since strings are encoded as 2 bytes and then there needs to be
another byte with the length of the following skipped bytes, encoding
values less than 3 bytes is going to be a net loss.

3. There is a new function to create a dictionary directly from an
array of text, rather than using the learning code:

```
select zson_create_dictionary(array['word1','word2']::text[]);
```

4. There is a function to augment the current dictionary from an array of text:

```
select zson_extend_dictionary(array['value1','value2','value3']::text[]);
```

This is particularly useful for adding common field prefixes or values. A good
example of field prefixes is URL values where the first part of the URL is
fairly constrained but the last part is not.


cheers

andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Add ZSON extension to /contrib/

Reply via email to