Re: [HACKERS] nested loop semijoin estimates

2015-06-06 Thread Tomas Vondra
FWIW, I've repeated the TPC-DS tests on a much larger data set (50GB) today, and I see that (a) 3f59be836c555fa679bbe0ec76de50a8b5cb23e0 (ANTI/SEMI join costing) changes nothing - there are some small cost changes, but only in plans involving semi/anti-joins (which is expected). Neverthe

Re: [HACKERS] nested loop semijoin estimates

2015-06-05 Thread Robert Haas
On Tue, Jun 2, 2015 at 10:10 PM, Tom Lane wrote: > What it seems like we should do, if we want to back-patch this, is apply > it without the add_path_precheck changes. Then as an independent > HEAD-only patch, change add_path_precheck so that it's behaving as > designed. It looks to me like that

Re: [HACKERS] nested loop semijoin estimates

2015-06-02 Thread Tom Lane
Tomas Vondra writes: > On 06/02/15 23:27, Tom Lane wrote: >> Do we have instructions around here anyplace on how to set up/use >>> TPC-DS? I couldn't find anything about it on the wiki ... > Not that I'm aware of, but it's not really all that difficult. > [ instructions... ] Thanks. I added som

Re: [HACKERS] nested loop semijoin estimates

2015-06-02 Thread Tom Lane
I wrote: > Hm. In principle, add_path_precheck shouldn't be doing anything except > rejecting paths that would get rejected anyway later on. However it > strikes me that there's a logic error in it; specifically, this > simplifying assumption is wrong: > * For speed, we make exact rather

Re: [HACKERS] nested loop semijoin estimates

2015-06-02 Thread Tom Lane
Tomas Vondra writes: > So, if required_outer=false and both p->_startup=false, we get > consider_startup=false irrespectedly of the required_outer value, so > (!consider_startupe) != required_outer > so that the conditions are not equivalent. And indeed, by reverting the > if condition to t

Re: [HACKERS] nested loop semijoin estimates

2015-06-02 Thread Tomas Vondra
On 06/02/15 21:39, Tom Lane wrote: I wrote: I'm a bit disturbed by that, because AFAICS from the plans, these queries did not involve any semi or anti joins, which should mean that the patch would not have changed the planner's behavior. After contemplating my navel for awhile, it occurred t

Re: [HACKERS] nested loop semijoin estimates

2015-06-02 Thread Tom Lane
I wrote: > I'm a bit disturbed by that, because AFAICS from the plans, these queries > did not involve any semi or anti joins, which should mean that the patch > would not have changed the planner's behavior. After contemplating my navel for awhile, it occurred to me that maybe the patch's changes

Re: [HACKERS] nested loop semijoin estimates

2015-06-02 Thread Tom Lane
Tomas Vondra writes: > On 06/02/15 16:37, Tom Lane wrote: >> It's possible that the change was due to random variation in ANALYZE >> statistics, in which case it was just luck. > I don't think so. I simply loaded the data, ran ANALYZE, and then simply > started either master or patched master. T

Re: [HACKERS] nested loop semijoin estimates

2015-06-02 Thread Tomas Vondra
On 06/02/15 16:37, Tom Lane wrote: Tomas Vondra writes: OK, so I did the testing today - with TPC-H and TPC-DS benchmarks. The results are good, IMHO. I'm a bit disturbed by that, because AFAICS from the plans, these queries did not involve any semi or anti joins, which should mean that the

Re: [HACKERS] nested loop semijoin estimates

2015-06-02 Thread Tom Lane
Tomas Vondra writes: > OK, so I did the testing today - with TPC-H and TPC-DS benchmarks. The > results are good, IMHO. > With TPC-H, I've used 1GB and 4GB datasets, and I've seen no plan > changes at all. I don't plan to run the tests on larger data sets, I do > expect the behavior to remain

Re: [HACKERS] nested loop semijoin estimates

2015-06-02 Thread Tomas Vondra
On 06/02/15 01:47, Josh Berkus wrote: On 06/01/2015 03:22 PM, Tomas Vondra wrote: On 06/01/15 23:47, Josh Berkus wrote: On 06/01/2015 02:18 PM, Tom Lane wrote: Anybody else want to speak for or against back-patching the patch as posted? I intentionally didn't push it in before today's relea

Re: [HACKERS] nested loop semijoin estimates

2015-06-01 Thread Josh Berkus
On 06/01/2015 03:22 PM, Tomas Vondra wrote: > > On 06/01/15 23:47, Josh Berkus wrote: >> On 06/01/2015 02:18 PM, Tom Lane wrote: >> >>> Anybody else want to speak for or against back-patching the patch as >>> posted? I intentionally didn't push it in before today's releases, >>> but I will push i

Re: [HACKERS] nested loop semijoin estimates

2015-06-01 Thread Tomas Vondra
On 06/01/15 23:47, Josh Berkus wrote: On 06/01/2015 02:18 PM, Tom Lane wrote: Anybody else want to speak for or against back-patching the patch as posted? I intentionally didn't push it in before today's releases, but I will push it later this week if there are not objections. I would like

Re: [HACKERS] nested loop semijoin estimates

2015-06-01 Thread Josh Berkus
On 06/01/2015 02:18 PM, Tom Lane wrote: > Tomas Vondra writes: >> On 06/01/15 00:08, Tom Lane wrote: >>> Attached is an incremental patch (on top of the previous one) to >>> allow startup cost of parameterized paths to be considered when the >>> relation is the RHS of a semi or anti join. It seems

Re: [HACKERS] nested loop semijoin estimates

2015-06-01 Thread Tom Lane
Tomas Vondra writes: > On 06/01/15 00:08, Tom Lane wrote: >> Attached is an incremental patch (on top of the previous one) to >> allow startup cost of parameterized paths to be considered when the >> relation is the RHS of a semi or anti join. It seems reasonably clean >> except for one thing: log

Re: [HACKERS] nested loop semijoin estimates

2015-05-31 Thread Tomas Vondra
On 06/01/15 00:08, Tom Lane wrote: Tomas Vondra writes: On 05/30/15 23:16, Tom Lane wrote: Attached is a draft patch for that. It fixes the problem for me: Seems to be working OK, but I still do get a Bitmap Heap Scan there (but more about that later). Attached is an incremental patch (o

Re: [HACKERS] nested loop semijoin estimates

2015-05-31 Thread Tom Lane
Tomas Vondra writes: > On 05/30/15 23:16, Tom Lane wrote: >> Attached is a draft patch for that. It fixes the problem for me: > Seems to be working OK, but I still do get a Bitmap Heap Scan there (but > more about that later). Attached is an incremental patch (on top of the previous one) to al

Re: [HACKERS] nested loop semijoin estimates

2015-05-30 Thread Tomas Vondra
Hi, On 05/30/15 23:16, Tom Lane wrote: I wrote: So what this seems to mean is that for SEMI/ANTI join cases, we have to postpone all of the inner scan cost determination to final_cost_nestloop, so that we can do this differently depending on whether has_indexed_join_quals() is true. That's a l

Re: [HACKERS] nested loop semijoin estimates

2015-05-30 Thread Tom Lane
I wrote: > So what this seems to mean is that for SEMI/ANTI join cases, we have to > postpone all of the inner scan cost determination to final_cost_nestloop, > so that we can do this differently depending on whether > has_indexed_join_quals() is true. That's a little bit annoying because it > wil

Re: [HACKERS] nested loop semijoin estimates

2015-05-30 Thread Tomas Vondra
On 05/30/15 21:50, Tom Lane wrote: So what this seems to mean is that for SEMI/ANTI join cases, we have to postpone all of the inner scan cost determination to final_cost_nestloop, so that we can do this differently depending on whether has_indexed_join_quals() is true. That's a little bit annoy

Re: [HACKERS] nested loop semijoin estimates

2015-05-30 Thread Tom Lane
Tomas Vondra writes: > I wonder whether the > run_cost += inner_run_cost; > is actually correct, because this pretty much means we assume scanning > the whole inner relation (once). Wouldn't something like this be more > appropriate? > run_cost += inner_run_cost * inner_scan_fr

Re: [HACKERS] nested loop semijoin estimates

2015-05-30 Thread Tomas Vondra
On 05/30/15 03:52, Tomas Vondra wrote: On 05/30/15 01:20, Tomas Vondra wrote: Notice the cost - it's way lover than the previous plan (9.2 vs ~111k), yet this plan was not chosen. So either the change broke something (e.g. by violating some optimizer assumption), or maybe there's a bug somewh

Re: [HACKERS] nested loop semijoin estimates

2015-05-29 Thread Tomas Vondra
On 05/30/15 01:20, Tomas Vondra wrote: Notice the cost - it's way lover than the previous plan (9.2 vs ~111k), yet this plan was not chosen. So either the change broke something (e.g. by violating some optimizer assumption), or maybe there's a bug somewhere else ... After a bit more investig

[HACKERS] nested loop semijoin estimates

2015-05-29 Thread Tomas Vondra
Hi, while looking at this post from pgsql-performance about plan changes http://www.postgresql.org/message-id/flat/20150529095117.gb15...@hjp.at I noticed that initial_cost_nestloop() does this in (9.1, mentioned in the pgsql-performance post uses the same logic): if (jointype == JOIN_S