On Mon, May 13, 2019 at 07:45:59AM +0500, Andrey Borodin wrote: > I was reviewing Paul Ramsey's TOAST patch[0] and noticed that there > is a big room for improvement in performance of pglz compression and > decompression.
Yes, I believe so too. pglz is a huge CPU-consumer when it comes to compilation compared to more modern algos like lz4. > With Vladimir we started to investigate ways to boost byte copying > and eventually created test suit[1] to investigate performance of > compression and decompression. This is and extension with single > function test_pglz() which performs tests for different: > 1. Data payloads > 2. Compression implementations > 3. Decompression implementations Cool. I got something rather similar in my wallet of plugins: https://github.com/michaelpq/pg_plugins/tree/master/compress_test This is something I worked on mainly for FPW compression in WAL. > Currently we test mostly decompression improvements against two WALs > and one data file taken from pgbench-generated database. Any > suggestion on more relevant data payloads are very welcome. Text strings made of random data and variable length? For any test of this kind I think that it is good to focus on the performance of the low-level calls, even going as far as a simple C wrapper on top of the pglz APIs to test only the performance and not have extra PG-related overhead like palloc() which can be a barrier. Focusing on strings of lengths of 1kB up to 16kB may be an idea of size, and it is important to keep the same uncompressed strings for performance comparison. > My laptop tests show that our decompression implementation [2] can > be from 15% to 50% faster. Also I've noted that compression is > extremely slow, ~30 times slower than decompression. I believe we > can do something about it. That's nice. > We focus only on boosting existing codec without any considerations > of other compression algorithms. There is this as well. A couple of algorithms have a license compatible with Postgres, but it may be more simple to just improve pglz. A 10%~20% improvement is something worth doing. > Most important questions are: > 1. What are relevant data sets? > 2. What are relevant CPUs? I have only XEON-based servers and few > laptops\desktops with intel CPUs > 3. If compression is 30 times slower, should we better focus on > compression instead of decompression? Decompression can matter a lot for mostly-read workloads and compression can become a bottleneck for heavy-insert loads, so improving compression or decompression should be two separate problems, not two problems linked. Any improvement in one or the other, or even both, is nice to have. -- Michael
signature.asc
Description: PGP signature