Re: Add ZSON extension to /contrib/

Tomas Vondra Sat, 10 Jul 2021 11:48:11 -0700

On 7/3/21 12:34 PM, Peter Eisentraut wrote:

On 04.06.21 17:09, Aleksander Alekseev wrote:
I decided to add the patch to the nearest commitfest.
With respect to the commit fest submission, I don't think there isconsensus right now to add this. I think people would prefer that thisdictionary facility be somehow made available in the existing JSONtypes. Also, I sense that there is still some volatility about some ofthe details of how this extension should work and its scope. I thinkthis is served best as an external extension for now.

I agree there's a lot of open questions to figure out, but I think this"column-level compression" capability has a lot of potential. Not justfor structured documents like JSON, but maybe even for scalar types.

I don't think the question whether this should be built into jsonb, aseparate built-in type, contrib type or something external is the one weneed to answer first.

The first thing I'd like to see is some "proof" that it's actuallyuseful in practice - there were some claims about people/customers usingit and being happy with the benefits, but there were no actual examplesof data sets that are expected to benefit, compression ratios etc. Andconsidering that [1] went unnoticed for 5 years, I have my doubts aboutit being used very widely. (I may be wrong and maybe people are just notcasting jsonb to zson.)

I've tried to use this on the one large non-synthetic JSONB dataset Ihad at hand at the moment, which is the bitcoin blockchain. That's ~1TBwith JSONB, and when I tried using ZSON instead there was no measurablebenefit, in fact the database was a bit larger. But I admit btc data israther strange, because it contains a lot of randomness (all the tx andblock IDs are random-looking hashes, etc.), and there's a lot of them ineach document. So maybe that's simply a data set that can't benefit fromzson on principle.

I also suspect the zson_extract_strings() is pretty inefficient and Iran into various issues with the btc blocks which have very many keys,often far more than the 10k limit.

In any case, I think having a clear example(s) of practical data setsthat benefit from using zson would be very useful, both to guide thedevelopment and to show what the potential gains are.

The other thing is that a lot of the stuff seems to be manual (e.g. thelearning), and not really well integrated with the core. IMO improvingthis by implementing the necessary infrastructure would help all thepossible cases (built-in type, contrib, external extension).



regards

[1]https://github.com/postgrespro/zson/commit/02db084ea3b94d9e68fd912dea97094634fcdea5


--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add ZSON extension to /contrib/

Reply via email to