On 09.04.2019 18:08, Heikki Linnakangas wrote:
On 09/04/2019 18:00, Konstantin Knizhnik wrote:
On 09.04.2019 17:09, Konstantin Knizhnik wrote:
standard Postgres heap and my VOPS extension.
As test data I used TPC-H benchmark (actually only one lineitem table
generated with tpch-dbgen utility with scale factor 10 (~8Gb database).
I attached script which I have use to populate data (you have to to
download, build and run tpch-dbgen yourself, also you can comment code
related with VOPS).
Cool, thanks!
Unfortunately I failed to load data in zedstore:
postgres=# insert into zedstore_lineitem_projection (select
l_shipdate,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag::"char",l_linestatus::"char"
from lineitem);
psql: ERROR: compression failed. what now?
Time: 237804.775 ms (03:57.805)
Yeah, it's still early days, it will crash and burn in a lot of cases.
We wanted to publish this early, to gather ideas and comments on the
high level design, and to validate that the table AM API that's in v12
is usable.
Looks like the original problem was caused by internal postgres
compressor: I have not configured Postgres to use lz4.
When I configured Postgres --with-lz4, data was correctly inserted in
zedstore table, but looks it is not compressed at all:
postgres=# select pg_relation_size('zedstore_lineitem_projection');
pg_relation_size
------------------
9363010640
The single-insert codepath isn't very optimized yet. If you populate
the table with large "INSERT ... SELECT ...", you end up with a huge
undo log. Try loading it with COPY.
You can also see how many pages of each type there is with:
select count(*), pg_zs_page_type('zedstore_lineitem_projection', g)
from generate_series(0,
pg_table_size('zedstore_lineitem_projection') / 8192 - 1) g group by 2;
- Heikki
postgres=# copy zedstore_lineitem from '/mnt/data/lineitem.tbl'
delimiter '|' csv;
COPY 59986052
Time: 232802.257 ms (03:52.802)
postgres=# select pg_relation_size('zedstore_lineitem');
pg_relation_size
------------------
10346504192
(1 row)
postgres=# select count(*), pg_zs_page_type('zedstore_lineitem', g)
from generate_series(0, pg_table_size('zedstore_lineitem') / 8192 -
1) g group by 2;
count | pg_zs_page_type
---------+-----------------
1 | META
1262308 | BTREE
692 | UNDO
(3 rows)
And now performance is much worser:
Time: 99819.476 ms (01:39.819)
It is strange, because the main advantage of columnar store is that it
has to fetch only accessed rows.
What I see is that in non-parallel mode (max_parallel_workers_per_gather
= 0)
backend consumes about 11GB of memory. It fits in my desktop RAM (16GB)
and speed is ~58 seconds.
But one I start 4 parallel workers, them cause huge swapping:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
28195 knizhnik 20 0 11.823g 6.553g 5.072g D 7.6 42.2 0:17.19
postgres
28074 knizhnik 20 0 11.848g 6.726g 5.223g D 7.3 43.3 4:14.96
postgres
28192 knizhnik 20 0 11.854g 6.586g 5.075g D 7.3 42.4 0:17.18
postgres
28193 knizhnik 20 0 11.870g 6.594g 5.064g D 7.3 42.4 0:17.19
postgres
28194 knizhnik 20 0 11.854g 6.589g 5.078g D 7.3 42.4 0:17.09
postgres
which is also strange because data should be present in shared buffers.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company