On Mon, Aug 19, 2019 at 04:15:30PM -0700, Alexandra Wang wrote: > On Sun, Aug 18, 2019 at 12:35 PM Justin Pryzby <pry...@telsasoft.com> wrote: > > > . I was missing a way to check for compression ratio; > > Here are the ways to check compression ratio for zedstore: > > Table level: > SELECT sum(uncompressedsz::numeric)/sum(totalsz) AS compratio FROM > pg_zs_btree_pages(<tablename>);
postgres=# SELECT sum(uncompressedsz::numeric)/sum(totalsz) AS compratio FROM pg_zs_btree_pages('child.cdrs_huawei_pgwrecord_2019_07_01'); compratio | 4.2730304163521529 For a fair test, I created a separate ZFS tablspace for storing just a copy of that table. ts=# CREATE TABLE test TABLESPACE testcomp AS SELECT * FROM child.cdrs_huawei_pgwrecord_2019_07_01; SELECT 39933381 Time: 882417.775 ms (14:42.418) zfs/testJTP20190819 compressratio 6.01x - zfs/testJTP20190819 compression gzip-1 inherited from zfs > Per column level: > select attno, count(*), sum(uncompressedsz::numeric)/sum(totalsz) as > compratio from pg_zs_btree_pages(<tablename>) group by attno order by attno; Order by 3; I see we have SOME highly compressed columns. It's still surprising to me that's as low as it is, given their content: phone numbers and IPv4 addresses in text form, using characters limited to [[:digit:].] (I realize we can probably save space using inet type.) 0 | 4743 | 1.00000000000000000000 32 | 21912 | 1.05953637381493823513 80 | 36441 | 1.2416446300175039 4 | 45059 | 1.3184106811322728 83 | 45059 | 1.3184106811322728 52 | 39208 | 1.3900788061770992 ... 74 | 3464 | 10.8258665101057364 17 | 3535 | 10.8776086243096534 3 | 7092 | 11.0388009154683678 11 | 3518 | 11.4396055611832109 65 | 3333 | 14.6594723104237634 35 | 14077 | 15.1642131499381887 ... 43 | 1601 | 21.4200106784573211 79 | 1599 | 21.4487670806076829 89 | 1934 | 23.6292134031933401 33 | 1934 | 23.6292134031933401 It seems clear the columns with high n_distinct have low compress ratio, and columns with high compress ratio are those with n_distinct=1... CREATE TEMP TABLE zs AS SELECT zs.*, n_distinct, avg_width, a.attname FROM (SELECT 'child.cdrs_huawei_pgwrecord_2019_07_01'::regclass t)t , LATERAL (SELECT attno, count(*), sum(uncompressedsz::numeric)/sum(totalsz) AS compratio FROM pg_zs_btree_pages(t) GROUP BY attno)zs , pg_attribute a, pg_class c, pg_stats s WHERE a.attrelid=t AND a.attnum=zs.attno AND c.oid=a.attrelid AND c.relname=s.tablename AND s.attname=a.attname; n_distinct | compratio ------------+------------------------ 217141 | 1.2416446300175039 154829 | 1.5306062496764190 144486 | 1.3900788061770992 128334 | 1.5395022739568842 121324 | 1.4005533187886683 86341 | 1.6262709389296389 84073 | 4.4379336418590519 65413 | 5.1890181028038757 63703 | 5.5029855093836425 63637 | 5.3648468796642262 46450 | 1.3184106811322728 46450 | 1.3184106811322728 43029 | 1.8003513772661308 39363 | 1.5845730687475706 36720 | 1.4751147557399539 36445 | 1.8403087513759131 36445 | 1.5453935268318613 11455 | 1.05953637381493823513 2862 | 9.8649823666870671 2625 | 2.3573614181847621 1376 | 1.7895024285340428 1335 | 2.2812551964262787 807 | 7.1192324141359373 610 | 7.9373623460089360 16 | 11.4396055611832109 10 | 5.5429763442365557 7 | 5.0440578041440675 7 | 5.2000132813261135 4 | 6.9741514753325536 4 | 4.2872818036896340 3 | 1.9080838412634827 3 | 2.9915954457453485 3 | 2.3056387009407882 2 | 10.8776086243096534 2 | 5.5950929307378287 2 | 18.5796576388128741 2 | 10.8258665101057364 2 | 9.1112820658021406 2 | 3.4986057630739795 2 | 4.6250999234025238 2 | 11.0388009154683678 1 | 15.1642131499381887 1 | 2.8855860118178798 1 | 23.6292134031933401 1 | 21.4200106784573211 [...] > > it looks like zedstore > > with lz4 gets ~4.6x for our largest customer's largest table. zfs using > > compress=gzip-1 gives 6x compression across all their partitioned > > tables, > > and I'm surprised it beats zedstore . > > > > What kind of tables did you use? Is it possible to give us the schema > of the table? Did you perform 'INSERT INTO ... SELECT' or COPY? I did this: |time ~/src/postgresql.bin/bin/pg_restore /srv/cdrperfbackup/ts/final/child.cdrs_huawei_pgwrecord_2019_07_01 -f- |PGOPTIONS='-cdefault_table_access_method=zedstore' psql --port 5678 postgres --host /tmp ... COPY 39933381 ... real 100m25.764s child | cdrs_huawei_pgwrecord_2019_07_01 | table | pryzbyj | permanent | 8277 MB | postgres=# SELECT array_to_string(array_agg(format_type(atttypid, atttypmod) ||CASE WHEN attnotnull THEN ' not null' ELSE '' END ORDER BY attnum),',') FROM pg_attribute WHERE attrelid='child.cdrs_huawei_pgwrecord_2019_07_01'::regclass AND attnum>0; array_to_string | text not null,text,text not null,text not null,text not null,text,text,text,boolean,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,timestamp without time zone not null,bigint not null,text not null,text,text,text,text,text,text,text,text,text,text not null,text,boolean,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,timestamp with time zone,timestamp with time zone,text,text,boolean,text,text,boolean,boolean,text not null,text not null