Re: [HACKERS] jsonb format is pessimal for toast compression

Andrew Dunstan Fri, 08 Aug 2014 09:37:00 -0700


On 08/08/2014 11:54 AM, Tom Lane wrote:

Andrew Dunstan <[email protected]> writes:

On 08/08/2014 11:18 AM, Tom Lane wrote:

That's not really the issue here, I think.  The problem is that a
relatively minor aspect of the representation, namely the choice to store
a series of offsets rather than a series of lengths, produces
nonrepetitive data even when the original input is repetitive.

It would certainly be worth validating that changing this would fix the
problem.
I don't know how invasive that would be - I suspect (without looking
very closely) not terribly much.

I took a quick look and saw that this wouldn't be that easy to get around.
As I'd suspected upthread, there are places that do random access into a
JEntry array, such as the binary search in findJsonbValueFromContainer().
If we have to add up all the preceding lengths to locate the corresponding
value part, we lose the performance advantages of binary search.  AFAICS
that's applied directly to the on-disk representation.  I'd thought
perhaps there was always a transformation step to build a pointer list,
but nope.

It would be interesting to know what the performance hit would be if wecalculated the offsets/pointers on the fly, especially if we could cacheit somehow. The main benefit of binary search is in saving oncomparisons, especially of strings, ISTM, and that could still beavailable - this would just be a bit of extra arithmetic.


cheers

andrew



--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

Reply via email to