On 12/19/2013 03:33 PM, Heikki Linnakangas wrote:
On 12/17/2013 12:49 AM, Heikki Linnakangas wrote:On 12/17/2013 12:22 AM, Alexander Korotkov wrote:On Mon, Dec 16, 2013 at 3:30 PM, Heikki Linnakangas <hlinnakan...@vmware.comwrote:On 12/12/2013 06:44 PM, Alexander Korotkov wrote: When values are packed into small groups, we have to either insertinefficiently encoded value or re-encode whole right part of values.It would probably be simplest to store newly inserted items uncompressed, in a separate area in the page. For example, grow the list of uncompressed items downwards from pg_upper, and the compressed items upwards from pg_lower. When the page fills up, re-encode the whole page.I hacked together an implementation of a variant of Simple9, to see how it performs. Insertions are handled per the above scheme.Here's an updated version of that, using the page layout without item-indexes that I described in the other post. This is much less buggy than that earlier crude version I posted - and unfortunately it doesn't compress as well. The earlier version lost some items :-(. Nevertheless, I think this page layout and code formatting is better, even if we switch the encoding back to the varbyte encoding in the end.
Yet another version. The encoding/decoding code is now quite isolated in ginpostinglist.c, so it's easy to experiment with different encodings. This patch uses varbyte encoding again.
I got a bit carried away, experimented with a bunch of different encodings. I tried rice encoding, rice encoding with block and offset number delta stored separately, the simple9 variant, and varbyte encoding.
The compressed size obviously depends a lot on the distribution of the items, but in the test set I used, the differences between different encodings were quite small.
One fatal problem with many encodings is VACUUM. If a page is completely full and you remove one item, the result must still fit. In other words, removing an item must never enlarge the space needed. Otherwise we have to be able to split on vacuum, which adds a lot of code, and also makes it possible for VACUUM to fail if there is no disk space left. That's unpleasant if you're trying to run VACUUM to release disk space. (gin fast updates already has that problem BTW, but let's not make it worse)
I believe that eliminates all encodings in the Simple family, as well as PForDelta, and surprisingly also Rice encoding. For example, if you have three items in consecutive offsets, the differences between them are encoded as 11 in rice encoding. If you remove the middle item, the encoding for the next item becomes 010, which takes more space than the original.
AFAICS varbyte encoding is safe from that. (a formal proof would be nice though).
So, I'm happy to go with varbyte encoding now, indeed I don't think we have much choice unless someone can come up with an alternative that's VACUUM-safe. I have to put this patch aside for a while now, I spent a lot more time on these encoding experiments than I intended. If you could take a look at this latest version, spend some time reviewing it and cleaning up any obsolete comments, and re-run the performance tests you did earlier, that would be great. One thing I'm slightly worried about is the overhead of merging the compressed and uncompressed posting lists in a scan. This patch will be in good shape for the final commitfest, or even before that.
- Heikki
gin-packed-postinglists-varbyte2.patch.gz
Description: GNU Zip compressed data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers