On 3/14/19 12:56 PM, Kyotaro HORIGUCHI wrote: > At Wed, 13 Mar 2019 19:37:45 +1300, David Rowley > <david.row...@2ndquadrant.com> wrote in > <cakjs1f_6qdqj9m2h0jf4brkzvlpfc7o9e+mxdxrq0wgv0z1...@mail.gmail.com> >> On Wed, 13 Mar 2019 at 17:20, Kyotaro HORIGUCHI >> <horiguchi.kyot...@lab.ntt.co.jp> wrote: >>> bms_member_index seems working differently than maybe expected. >>> >>> bms_member_index((2, 4), 0) => 0, (I think) should be -1 >>> bms_member_index((2, 4), 1) => 0, should be -1 >>> bms_member_index((2, 4), 2) => 0, should be 0 >>> bms_member_index((2, 4), 3) => 1, should be -1 >>> bms_member_index((2, 4), 4) => 1, should be 1 >>> bms_member_index((2, 4), 5) => 2, should be -1 >>> bms_member_index((2, 4), 6) => 2, should be -1 >>> ... >>> bms_member_index((2, 4), 63) => 2, should be -1 >>> bms_member_index((2, 4), 64) => -1, correct >>> >>> It works correctly only when x is a member - the way the function >>> is maybe actually used in this patch -, or needs to change the >>> specifiction (or the comment) of the function. >> >> Looks like: >> >> + if (wordnum >= a->nwords) >> + return -1; >> >> should be: >> >> + if (wordnum >= a->nwords || >> + (a->word[wordnum] & ((bitmapword) 1 << bitnum)) == 0) >> + return -1; > > Yeah, seems right. >
Yep, that was broken. The attached patch fixes this by simply calling bms_is_member, instead of copying the checks into bms_member_index. I've also reworked the regression tests to use a function extracting the cardinality estimates, as proposed by Dean and David. I have not reduced the size of data sets yet, so the tests are not much faster, but we no longer check the exact query plan. That's probably a good idea anyway. Actually - the tests are a bit faster because it allows removing indexes that were used for the query plans. FWIW I've noticed an annoying thing when modifying type of column not included in a statistics. Consider this: create table t (a int, b int, c text); insert into t select mod(i,10), mod(i,10), '' from generate_series(1,10000) s(i); create statistics s (dependencies) on a,b from t; analyze t; explain analyze select * from t where a = 1 and b = 1; QUERY PLAN --------------------------------------------------------------------- Seq Scan on t (cost=0.00..205.00 rows=1000 width=9) (actual time=0.014..1.910 rows=1000 loops=1) Filter: ((a = 1) AND (b = 1)) Rows Removed by Filter: 9000 Planning Time: 0.119 ms Execution Time: 2.234 ms (5 rows) alter table t alter c type varchar(61); explain analyze select * from t where a = 1 and b = 1; QUERY PLAN --------------------------------------------------------------------- Seq Scan on t (cost=0.00..92.95 rows=253 width=148) (actual time=0.020..2.420 rows=1000 loops=1) Filter: ((a = 1) AND (b = 1)) Rows Removed by Filter: 9000 Planning Time: 0.128 ms Execution Time: 2.767 ms (5 rows) select stxdependencies from pg_statistic_ext; stxdependencies ------------------------------------------ {"1 => 2": 1.000000, "2 => 1": 1.000000} (1 row) That is, we don't remove the statistics, but the estimate still changes. But that's because the ALTER TABLE also resets reltuples/relpages: select relpages, reltuples from pg_class where relname = 't'; relpages | reltuples ----------+----------- 0 | 0 (1 row) That's a bit unfortunate, and it kinda makes the whole effort to not drop the statistics unnecessarily kinda pointless :-( regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
0001-multivariate-MCV-lists-20190315.patch.gz
Description: application/gzip
0002-multivariate-histograms-20190315.patch.gz
Description: application/gzip