Re: Zedstore - compressed in-core columnar storage

Konstantin Knizhnik Tue, 09 Apr 2019 08:46:15 -0700



On 09.04.2019 18:08, Heikki Linnakangas wrote:

On 09/04/2019 18:00, Konstantin Knizhnik wrote:
On 09.04.2019 17:09, Konstantin Knizhnik wrote:
standard Postgres heap and my VOPS extension.
As test data I used TPC-H benchmark (actually only one lineitem table
generated with tpch-dbgen utility with scale factor 10 (~8Gb database).
I attached script which I have use to populate data (you have to to
download, build and run tpch-dbgen yourself, also you can comment code
related with VOPS).
Cool, thanks!
Unfortunately I failed to load data in zedstore:

postgres=# insert into zedstore_lineitem_projection (select
l_shipdate,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag::"char",l_linestatus::"char"
from lineitem);
psql: ERROR:  compression failed. what now?
Time: 237804.775 ms (03:57.805)
Yeah, it's still early days, it will crash and burn in a lot of cases.We wanted to publish this early, to gather ideas and comments on thehigh level design, and to validate that the table AM API that's in v12is usable.
Looks like the original problem was caused by internal postgres
compressor: I have not configured Postgres to use lz4.
When I configured Postgres --with-lz4, data was correctly inserted in
zedstore table, but looks it is not compressed at all:

postgres=# select pg_relation_size('zedstore_lineitem_projection');
   pg_relation_size
------------------
         9363010640
The single-insert codepath isn't very optimized yet. If you populatethe table with large "INSERT ... SELECT ...", you end up with a hugeundo log. Try loading it with COPY.
You can also see how many pages of each type there is with:

select count(*), pg_zs_page_type('zedstore_lineitem_projection', g)
from generate_series(0,pg_table_size('zedstore_lineitem_projection') / 8192 - 1) g group by 2;
- Heikki

postgres=# copy zedstore_lineitem from '/mnt/data/lineitem.tbl'delimiter '|' csv;

COPY 59986052
Time: 232802.257 ms (03:52.802)
postgres=# select pg_relation_size('zedstore_lineitem');
 pg_relation_size
------------------
      10346504192
(1 row)
postgres=# select count(*), pg_zs_page_type('zedstore_lineitem', g)

from generate_series(0, pg_table_size('zedstore_lineitem') / 8192 -1) g group by 2;

  count  | pg_zs_page_type
---------+-----------------
       1 | META
 1262308 | BTREE
     692 | UNDO
(3 rows)

And now performance is much worser:
Time: 99819.476 ms (01:39.819)

It is strange, because the main advantage of columnar store is that ithas to fetch only accessed rows.What I see is that in non-parallel mode (max_parallel_workers_per_gather= 0)backend consumes about 11GB of memory. It fits in my desktop RAM (16GB)and speed is ~58 seconds.

But one I start 4 parallel workers, them cause huge swapping:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+COMMAND28195 knizhnik 20 0 11.823g 6.553g 5.072g D 7.6 42.2 0:17.19postgres28074 knizhnik 20 0 11.848g 6.726g 5.223g D 7.3 43.3 4:14.96postgres28192 knizhnik 20 0 11.854g 6.586g 5.075g D 7.3 42.4 0:17.18postgres28193 knizhnik 20 0 11.870g 6.594g 5.064g D 7.3 42.4 0:17.19postgres28194 knizhnik 20 0 11.854g 6.589g 5.078g D 7.3 42.4 0:17.09postgres


which is also strange because data should be present in shared buffers.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Zedstore - compressed in-core columnar storage

Reply via email to