Re: [HACKERS] multivariate statistics v14

Tomas Vondra Wed, 23 Mar 2016 06:24:09 -0700

On 03/23/2016 06:20 AM, Tatsuo Ishii wrote:

I am now looking into the create statistics doc to see if the example
appearing in it is working. I will get back if I find any.


I have the ref doc: CREATE STATISTICS

There are nice examples how the multivariate statistics gives better
row number estimation. So I gave them a try.

"Create table t1 with two functionally dependent columns,
 i.e. knowledge of a value in the first column is sufficient for
 determining the value in the other column" The example creates table
 "t1", then populates it using generate_series. After CREATE
 STATISTICS, ANALYZE and EXPLAIN. I expected the EXPLAIN demonstrates
 how result rows estimation is enhanced by using the multivariate
 statistics.

Here is the EXPLAIN output using the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1);
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..19425.00 rows=98 width=8) (actual 
time=76.876..76.876 rows=0 loops=1)
   Filter: ((a = 1) AND (b = 1))
   Rows Removed by Filter: 1000000
 Planning time: 0.146 ms
 Execution time: 76.896 ms
(5 rows)

Here is the EXPLAIN output without the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1);
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..19425.00 rows=1 width=8) (actual 
time=78.867..78.867 rows=0 loops=1)
   Filter: ((a = 1) AND (b = 1))
   Rows Removed by Filter: 1000000
 Planning time: 0.102 ms
 Execution time: 78.885 ms
(5 rows)

It seems the row numbers estimation (98) using the multivariate
statistics is actually *worse* than the one (1) not using the
statistics because the actual row number is 0.

Yes, there's a mistake in the first query, because the conditionsactually are not compatible. I.e. (i/100)=1 and (i/500)=1 have nooverlapping rows, clearly. It should be


EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 0);

instead. Will fix.


Next example (using table "t2") is much better than the case using t1.

Here is the EXPLAIN output using the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1);
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Seq Scan on t2  (cost=0.00..19425.00 rows=9633 width=8) (actual 
time=0.012..75.350 rows=10000 loops=1)
   Filter: ((a = 1) AND (b = 1))
   Rows Removed by Filter: 990000
 Planning time: 0.107 ms
 Execution time: 75.680 ms
(5 rows)

Here is the EXPLAIN output without the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1);
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Seq Scan on t2  (cost=0.00..19425.00 rows=91 width=8) (actual 
time=0.008..76.614 rows=10000 loops=1)
   Filter: ((a = 1) AND (b = 1))
   Rows Removed by Filter: 990000
 Planning time: 0.067 ms
 Execution time: 76.935 ms
(5 rows)

This time it seems the row numbers estimation (9633) using the
multivariate statistics is much better than the one (91) not using the
statistics because the actual row number is 10000.

The last example (using table "t3") seems no effect by multivariate statistics.

Yes. There's a typo in the example - it analyzes the wrong table (t2instead of t3). Once I fix that, the estimates are much better.

In summary, the only case which shows the effect of the multivariate
statistics is the "t2" case. So I don't see why other examples are
shown in the manual. Am I missing something?

No, thanks for spotting those mistakes. I'll fix them and submit a newversion of the patch - either later today or perhaps tomorrow.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multivariate statistics v14

Reply via email to