On 26 August 2014 11:34, Josh Berkus <j...@agliodbs.com> wrote: > On 08/26/2014 07:51 AM, Tom Lane wrote: > > My feeling about it at this point is that the apparent speed gain from > > using offsets is illusory: in practically all real-world cases where > there > > are enough keys or array elements for it to matter, costs associated with > > compression (or rather failure to compress) will dominate any savings we > > get from offset-assisted lookups. I agree that the evidence for this > > opinion is pretty thin ... but the evidence against it is nonexistent. > > Well, I have shown one test case which shows where lengths is a net > penalty. However, for that to be the case, you have to have the > following conditions *all* be true: > > * lots of top-level keys > * short values > * rows which are on the borderline for TOAST > * table which fits in RAM > > ... so that's a "special case" and if it's sub-optimal, no bigee. Also, > it's not like it's an order-of-magnitude slower. > > Anyway, I called for feedback on by blog, and have gotten some: > > http://www.databasesoup.com/2014/08/the-great-jsonb-tradeoff.html
It would be really interesting to see your results with column STORAGE EXTERNAL for that benchmark. I think it is important to separate out the slowdown due to decompression now being needed vs that inherent in the new format, we can always switch off compression on a per-column basis using STORAGE EXTERNAL. My JSON data has smallish objects with a small number of keys, it barely compresses at all with the patch and shows similar results to Arthur's data. Across ~500K rows I get: encoded=# select count(properties->>'submitted_by') from compressed; count -------- 431948 (1 row) Time: 250.512 ms encoded=# select count(properties->>'submitted_by') from uncompressed; count -------- 431948 (1 row) Time: 218.552 ms Laurence