On Tue, Nov 22, 2016 at 2:42 PM, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote:
> On 11/21/2016 11:10 PM, Robert Haas wrote: > >> [ reviving an old multivariate statistics thread ] >> >> On Thu, Nov 13, 2014 at 6:31 AM, Simon Riggs <si...@2ndquadrant.com> >> wrote: >> >>> On 12 October 2014 23:00, Tomas Vondra <t...@fuzzy.cz> wrote: >>> >>> It however seems to be working sufficiently well at this point, enough >>>> to get some useful feedback. So here we go. >>>> >>> >>> This looks interesting and useful. >>> >>> What I'd like to check before a detailed review is that this has >>> sufficient applicability to be useful. >>> >>> My understanding is that Q9 and Q18 of TPC-H have poor plans as a >>> result of multi-column stats errors. >>> >>> Could you look at those queries and confirm that this patch can >>> produce better plans for them? >>> >> >> Tomas, did you ever do any testing in this area? One of my >> colleagues, Rafia Sabih, recently did some testing of TPC-H queries @ >> 20 GB. Q18 actually doesn't complete at all right now because of an >> issue with the new simplehash implementation. I reported it to Andres >> and he tracked it down, but hasn't posted the patch yet - see >> http://archives.postgresql.org/message-id/20161115192802.jfb >> ec5s6ougxw...@alap3.anarazel.de >> >> Of the remaining queries, the slowest are Q9 and Q20, and both of them >> have serious estimation errors. On Q9, things go wrong here: >> >> -> Merge Join >> (cost=5225092.04..6595105.57 rows=154 width=47) (actual >> time=103592.821..149335.010 rows=6503988 loops=1) >> Merge Cond: >> (partsupp.ps_partkey = lineitem.l_partkey) >> Join Filter: >> (lineitem.l_suppkey = partsupp.ps_suppkey) >> Rows Removed by Join Filter: >> 19511964 >> -> Index Scan using >> > > [snip] > >> >> Rows Removed by Filter: 756627 >> >> The estimate for the index scan on partsupp is essentially perfect, >> and the lineitem-part join is off by about 3x. However, the merge >> join is off by about 4000x, which is real bad. >> >> > The patch only deals with statistics on base relations, no joins, at this > point. It's meant to be extended in that direction, so the syntax supports > it, but at this point that's all. No joins. > > That being said, this estimate should be improved in 9.6, when you create > a foreign key between the tables. In fact, that patch was exactly about Q9. > > This is how the join estimate looks on scale 1 without the FK between the > two tables: > > QUERY PLAN > ----------------------------------------------------------------------- > Merge Join (cost=19.19..700980.12 rows=2404 width=261) > Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND > (lineitem.l_suppkey = partsupp.ps_suppkey)) > -> Index Scan using idx_lineitem_part_supp on lineitem > (cost=0.43..605856.84 rows=6001117 width=117) > -> Index Scan using partsupp_pkey on partsupp > (cost=0.42..61141.76 rows=800000 width=144) > (4 rows) > > > and with the foreign key: > > QUERY PLAN > ----------------------------------------------------------------------- > Merge Join (cost=19.19..700980.12 rows=6001117 width=261) > (actual rows=6001215 loops=1) > Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND > (lineitem.l_suppkey = partsupp.ps_suppkey)) > -> Index Scan using idx_lineitem_part_supp on lineitem > (cost=0.43..605856.84 rows=6001117 width=117) > (actual rows=6001215 loops=1) > -> Index Scan using partsupp_pkey on partsupp > (cost=0.42..61141.76 rows=800000 width=144) > (actual rows=6001672 loops=1) > Planning time: 3.840 ms > Execution time: 21987.913 ms > (6 rows) > > > On Q20, things go wrong here: >> > > > >> [snip] >> >> The estimate for the GroupAggregate feeding one side of the merge join >> is quite accurate. The estimate for the part-partsupp join on the >> other side is off by 8x. Then things get much worse: the estimate for >> the merge join is off by 400x. >> >> > Well, most of the estimation error comes from the join, but sadly the > aggregate makes using the foreign keys impossible - at least in the current > version. I don't know if it can be improved, somehow. > > I'm not really sure whether the multivariate statistics stuff will fix >> this kind of case or not, but if it did it would be awesome. >> >> > Join statistics are something I'd like to add eventually, but I don't see > how it could happen in the first version. Also, the patch received no > reviews this CF, and making it even larger is unlikely to make it more > attractive. > Moved to next CF with "needs review" status. Regards, Hari Babu Fujitsu Australia