On 09.04.2019 18:08, Heikki Linnakangas wrote:
On 09/04/2019 18:00, Konstantin Knizhnik wrote:
On 09.04.2019 17:09, Konstantin Knizhnik wrote:
standard Postgres heap and my VOPS extension.
As test data I used TPC-H benchmark (actually only one lineitem table
generated with tpch-dbgen utility with scale factor 10 (~8Gb database).
I attached script which I have use to populate data (you have to to
download, build and run tpch-dbgen yourself, also you can comment code
related with VOPS).

Cool, thanks!

Unfortunately I failed to load data in zedstore:

postgres=# insert into zedstore_lineitem_projection (select
l_shipdate,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag::"char",l_linestatus::"char"
from lineitem);
psql: ERROR:  compression failed. what now?
Time: 237804.775 ms (03:57.805)

Yeah, it's still early days, it will crash and burn in a lot of cases. We wanted to publish this early, to gather ideas and comments on the high level design, and to validate that the table AM API that's in v12 is usable.

Looks like the original problem was caused by internal postgres
compressor: I have not configured Postgres to use lz4.
When I configured Postgres --with-lz4, data was correctly inserted in
zedstore table, but looks it is not compressed at all:

postgres=# select pg_relation_size('zedstore_lineitem_projection');
   pg_relation_size
------------------
         9363010640

The single-insert codepath isn't very optimized yet. If you populate the table with large "INSERT ... SELECT ...", you end up with a huge undo log. Try loading it with COPY.

You can also see how many pages of each type there is with:

select count(*), pg_zs_page_type('zedstore_lineitem_projection', g)
  from generate_series(0, pg_table_size('zedstore_lineitem_projection') / 8192 - 1) g group by 2;

- Heikki

postgres=# copy zedstore_lineitem from '/mnt/data/lineitem.tbl' delimiter '|' csv;
COPY 59986052
Time: 232802.257 ms (03:52.802)
postgres=# select pg_relation_size('zedstore_lineitem');
 pg_relation_size
------------------
      10346504192
(1 row)
postgres=# select count(*), pg_zs_page_type('zedstore_lineitem', g)
  from generate_series(0, pg_table_size('zedstore_lineitem') / 8192 - 1) g group by 2;
  count  | pg_zs_page_type
---------+-----------------
       1 | META
 1262308 | BTREE
     692 | UNDO
(3 rows)

And now performance is much worser:
Time: 99819.476 ms (01:39.819)

It is strange, because the main advantage of columnar store is that it has to fetch only accessed rows. What I see is that in non-parallel mode (max_parallel_workers_per_gather = 0) backend consumes about 11GB of memory. It fits in my desktop RAM (16GB) and speed is ~58 seconds.
But one I start 4 parallel workers, them cause huge swapping:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND 28195 knizhnik  20   0 11.823g 6.553g 5.072g D   7.6 42.2   0:17.19 postgres 28074 knizhnik  20   0 11.848g 6.726g 5.223g D   7.3 43.3   4:14.96 postgres 28192 knizhnik  20   0 11.854g 6.586g 5.075g D   7.3 42.4   0:17.18 postgres 28193 knizhnik  20   0 11.870g 6.594g 5.064g D   7.3 42.4   0:17.19 postgres 28194 knizhnik  20   0 11.854g 6.589g 5.078g D   7.3 42.4   0:17.09 postgres

which is also strange because data should be present in shared buffers.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Reply via email to