Re: pglz performance

Michael Paquier Mon, 13 May 2019 00:15:26 -0700

On Mon, May 13, 2019 at 07:45:59AM +0500, Andrey Borodin wrote:
> I was reviewing Paul Ramsey's TOAST patch[0] and noticed that there
> is a big room for improvement in performance of pglz compression and
> decompression.


Yes, I believe so too.  pglz is a huge CPU-consumer when it comes to
compilation compared to more modern algos like lz4.

> With Vladimir we started to investigate ways to boost byte copying
> and eventually created test suit[1] to investigate performance of
> compression and decompression.  This is and extension with single
> function test_pglz() which performs tests for different: 
> 1. Data payloads
> 2. Compression implementations
> 3. Decompression implementations

Cool.  I got something rather similar in my wallet of plugins:
https://github.com/michaelpq/pg_plugins/tree/master/compress_test
This is something I worked on mainly for FPW compression in WAL.

> Currently we test mostly decompression improvements against two WALs
> and one data file taken from pgbench-generated database. Any
> suggestion on more relevant data payloads are very welcome.

Text strings made of random data and variable length?  For any test of
this kind I think that it is good to focus on the performance of the
low-level calls, even going as far as a simple C wrapper on top of the
pglz APIs to test only the performance and not have extra PG-related
overhead like palloc() which can be a barrier.  Focusing on strings of
lengths of 1kB up to 16kB may be an idea of size, and it is important
to keep the same uncompressed strings for performance comparison.

> My laptop tests show that our decompression implementation [2] can
> be from 15% to 50% faster.  Also I've noted that compression is
> extremely slow, ~30 times slower than decompression. I believe we
> can do something about it.

That's nice.

> We focus only on boosting existing codec without any considerations
> of other compression algorithms.

There is this as well.  A couple of algorithms have a license
compatible with Postgres, but it may be more simple to just improve
pglz.  A 10%~20% improvement is something worth doing.

> Most important questions are:
> 1. What are relevant data sets?
> 2. What are relevant CPUs? I have only XEON-based servers and few
> laptops\desktops with intel CPUs
> 3. If compression is 30 times slower, should we better focus on
> compression instead of decompression?

Decompression can matter a lot for mostly-read workloads and
compression can become a bottleneck for heavy-insert loads, so
improving compression or decompression should be two separate
problems, not two problems linked.  Any improvement in one or the
other, or even both, is nice to have.
--
Michael

signature.asc
Description: PGP signature

Re: pglz performance

Reply via email to