Re: [HACKERS] nested loop semijoin estimates

Tomas Vondra Sun, 31 May 2015 16:36:07 -0700

On 06/01/15 00:08, Tom Lane wrote:

Tomas Vondra <tomas.von...@2ndquadrant.com> writes:

On 05/30/15 23:16, Tom Lane wrote:

Attached is a draft patch for that.  It fixes the problem for me:

Seems to be working OK, but I still do get a Bitmap Heap Scan there (but
more about that later).


Attached is an incremental patch (on top of the previous one) to
allow startup cost of parameterized paths to be considered when the
relation is the RHS of a semi or anti join. It seems reasonably clean
except for one thing: logically, we perhaps should remove the checks
on path->param_info from the last half of
compare_path_costs_fuzzily(), so as to increase the symmetry between
parameterized paths and unparameterized ones. However, when I did
that, I got changes in some of the regression test plans, and they
didn't seem to be for the better. So I left that alone. As-is, this
patch doesn't seem to affect the results for any existing regression
tests.

Seems to be working fine. I've tried a bunch of queries modifying thetest case in various ways, and all seem to be planned fine. I've beenunable to come up with a query that'd get planned badly.

Regarding the remaining checks in compare_path_costs_fuzzily(), isn'tthat simply caused by very small data sets? For example the first"failing" plan in join.sql looks like this:


Nested Loop Left Join  (cost=0.29..22.80 rows=2 width=12)

 Nested Loop Left Join  (cost=0.29..22.80 rows=2 width=12)
   Output: "*VALUES*".column1, i1.f1, (666)
   Join Filter: ("*VALUES*".column1 = i1.f1)
   ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=4)
         Output: "*VALUES*".column1
   ->  Materialize  (cost=0.29..22.64 rows=5 width=8)
         Output: i1.f1, (666)
         ->  Nested Loop Left Join  (cost=0.29..22.61 rows=5 width=8)
               Output: i1.f1, 666
               ->  Seq Scan on public.int4_tbl i1  (cost=0.00..1.05 ...
                     Output: i1.f1
               ->  Index Only Scan using tenk1_unique2 on public....
                     Output: i2.unique2
                     Index Cond: (i2.unique2 = i1.f1)

while with the changes it'd look like this:

 Hash Right Join  (cost=0.34..22.70 rows=2 width=12)
   Output: "*VALUES*".column1, i1.f1, (666)
   Hash Cond: (i1.f1 = "*VALUES*".column1)
   ->  Nested Loop Left Join  (cost=0.29..22.61 rows=5 width=8)
         Output: i1.f1, 666
         ->  Seq Scan on public.int4_tbl i1  (cost=0.00..1.05 ...
               Output: i1.f1
         ->  Index Only Scan using tenk1_unique2 on public.tenk1 ...
               Output: i2.unique2
               Index Cond: (i2.unique2 = i1.f1)
   ->  Hash  (cost=0.03..0.03 rows=2 width=4)
         Output: "*VALUES*".column1
         ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 ...
               Output: "*VALUES*".column1
(14 rows)

So the planner actually believes the plan to be cheaper, although onlyby a tiny margin. And I see pretty much no difference in planning/exectime (but I'm on a machine with power-management and VMs, so a lot ofrandom noise).

But once the int4_tbl gets bigger (say, 160k rows instead of the 5),even the current the hash join clearly wins. Actually, it switches fromthe current plan way sooner (at 500 rows it's already using the hashjoin, and I see about the same timings).

I don't really see why you think those plan changes to be bad? And evenif they are, isn't that simply a matter of tuning the cost parameters?

Do you plan to push that into 9.5, or 9.6? I assume it's a
behavior change so that no back-patching, right?


Mumble. It's definitely a planner bug fix, and I believe that the
effects are narrowly constrained to cases where significantly better
plans are possible. So personally I'd be willing to back-patch it.
But others might see that differently, especially since it's been
like this for a long time and we only have one field complaint.

+1 to back-patching from me. It's true we only have one field complaint,but I believe there are more users impacted by this. They just didn'tnotice - we only really got this complaint because of the plandiscrepancy between queries that are almost exactly the same.


--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] nested loop semijoin estimates

Reply via email to