> On Aug 11, 2021, at 10:38 AM, Tomas Vondra <tomas.von...@enterprisedb.com> 
> wrote:
> 
> So I'm a bit puzzled about the claim that random data make the problems more 
> extreme. Can you explain?

Hmm... you appear to be right.

I changed the gentest.pl script to fill the tables with randomized data, but 
the random data is being regenerated each test run (since the calls to random() 
are in the gentest.sql file).  Adding an explicit setseed() call in the test to 
make sure the data is the same before and after applying your patch eliminates 
the differences.

So there are three tests here.  The first tests deterministic orderly data.  
The second tests deterministic random data without repeats and hence without 
meaningful mvc.  The third tests deterministic random data with rounding into 
twenty buckets skewed towards lower numbered buckets and hence with both 
repeats and meaningful mvc.

The original test set:

        TOTAL:
                better: 77827
                worse: 12317

The random test set, with setseed() calls to make it deterministic:

        TOTAL:
                better: 49708
                worse: 19393

The random test set , with setseed() calls to make it deterministic plus 
rounding into buckets:

        TOTAL:
                better: 81764
                worse: 19594

Once the data is made deterministic, the third set looks slightly better than 
the first, rather than slightly worse.  But almost 20% of the query types still 
look worse after applying the patch.  I'm going to dig deeper into those to see 
if that conclusion survives bumping up the size of the dataset.  It will take 
quite some time to run the tests with a huge dataset, but I don't see how else 
to investigate this.


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Reply via email to