On 12/02/2017 09:24 PM, konstantin knizhnik wrote: > > On Dec 2, 2017, at 6:04 PM, Tomas Vondra wrote: > >> On 12/01/2017 10:52 PM, Andres Freund wrote: >> ... >> >> Other algorithms (e.g. zstd) got significantly better compression (25%) >> compared to pglz, but in exchange for longer compression. I'm sure we >> could lower compression level to make it faster, but that will of course >> hurt the compression ratio. >> >> I don't think switching to a different compression algorithm is a way >> forward - it was proposed and explored repeatedly in the past, and every >> time it failed for a number of reasons, most of which are still valid. >> >> >> Firstly, it's going to be quite hard (or perhaps impossible) to >> find an algorithm that is "universally better" than pglz. Some >> algorithms do work better for text documents, some for binary >> blobs, etc. I don't think there's a win-win option. >> >> Sure, there are workloads where pglz performs poorly (I've seen >> such cases too), but IMHO that's more an argument for the custom >> compression method approach. pglz gives you good default >> compression in most cases, and you can change it for columns where >> it matters, and where a different space/time trade-off makes >> sense. >> >> >> Secondly, all the previous attempts ran into some legal issues, i.e. >> licensing and/or patents. Maybe the situation changed since then (no >> idea, haven't looked into that), but in the past the "pluggable" >> approach was proposed as a way to address this. >> >> > > May be it will be interesting for you to see the following results > of applying page-level compression (CFS in PgPro-EE) to pgbench > data: >
I don't follow. If I understand what CFS does correctly (and I'm mostly guessing here, because I haven't seen the code published anywhere, and I assume it's proprietary), it essentially compresses whole 8kB blocks. I don't know it reorganizes the data into columnar format first, in some way (to make it more "columnar" which is more compressible), which would make somewhat similar to page-level compression in Oracle. But it's clearly a very different approach from what the patch aims to improve (compressing individual varlena values). > > All algorithms (except zlib) were used with best-speed option: using > better compression level usually has not so large impact on > compression ratio (<30%), but can significantly increase time > (several times). Certainly pgbench isnot the best candidate for > testing compression algorithms: it generates a lot of artificial and > redundant data. But we measured it also on real customers data and > still zstd seems to be the best compression methods: provides good > compression with smallest CPU overhead. > I think this really depends on the dataset, and drawing conclusions based on a single test is somewhat crazy. Especially when it's synthetic pgbench data with lots of inherent redundancy - sequential IDs, ... My takeaway from the results is rather that page-level compression may be very beneficial in some cases, although I wonder how much of that can be gained by simply using compressed filesystem (thus making it transparent to PostgreSQL). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services