On 7/3/21 12:34 PM, Peter Eisentraut wrote:
On 04.06.21 17:09, Aleksander Alekseev wrote:
I decided to add the patch to the nearest commitfest.

With respect to the commit fest submission, I don't think there is consensus right now to add this.  I think people would prefer that this dictionary facility be somehow made available in the existing JSON types.  Also, I sense that there is still some volatility about some of the details of how this extension should work and its scope.  I think this is served best as an external extension for now.

I agree there's a lot of open questions to figure out, but I think this "column-level compression" capability has a lot of potential. Not just for structured documents like JSON, but maybe even for scalar types.

I don't think the question whether this should be built into jsonb, a separate built-in type, contrib type or something external is the one we need to answer first.

The first thing I'd like to see is some "proof" that it's actually useful in practice - there were some claims about people/customers using it and being happy with the benefits, but there were no actual examples of data sets that are expected to benefit, compression ratios etc. And considering that [1] went unnoticed for 5 years, I have my doubts about it being used very widely. (I may be wrong and maybe people are just not casting jsonb to zson.)

I've tried to use this on the one large non-synthetic JSONB dataset I had at hand at the moment, which is the bitcoin blockchain. That's ~1TB with JSONB, and when I tried using ZSON instead there was no measurable benefit, in fact the database was a bit larger. But I admit btc data is rather strange, because it contains a lot of randomness (all the tx and block IDs are random-looking hashes, etc.), and there's a lot of them in each document. So maybe that's simply a data set that can't benefit from zson on principle.

I also suspect the zson_extract_strings() is pretty inefficient and I ran into various issues with the btc blocks which have very many keys, often far more than the 10k limit.

In any case, I think having a clear example(s) of practical data sets that benefit from using zson would be very useful, both to guide the development and to show what the potential gains are.

The other thing is that a lot of the stuff seems to be manual (e.g. the learning), and not really well integrated with the core. IMO improving this by implementing the necessary infrastructure would help all the possible cases (built-in type, contrib, external extension).


regards

[1] https://github.com/postgrespro/zson/commit/02db084ea3b94d9e68fd912dea97094634fcdea5

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply via email to