Re: [HACKERS] add_path optimization

2010-02-27 Thread Robert Haas
On Fri, Feb 26, 2010 at 10:07 PM, Bruce Momjian wrote: > Did this ever get applied/resolved? No, I never went back and tried to fix the brokenness Tom found. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgre

Re: [HACKERS] add_path optimization

2010-02-26 Thread Bruce Momjian
Did this ever get applied/resolved? --- Robert Haas wrote: > I've been doing some benchmarking and profiling on the PostgreSQL > query analyzer, and it seems that (at least for the sorts of queries > that I typically run) th

Re: [HACKERS] add_path optimization

2009-03-02 Thread Kevin Grittner
>>> Tom Lane wrote: > After a lot of distractions, I've finished applying the planner fixes > that seem necessary in view of your report about poorer planning in 8.4 > than 8.3. When you have a chance, it would be useful to do a thorough > test of CVS HEAD on your data and query mix --- are ther

Re: [HACKERS] add_path optimization

2009-02-28 Thread Robert Haas
On Fri, Feb 27, 2009 at 11:06 PM, Tom Lane wrote: > One other thought to roll around in your head: at the time that the > current add_path logic was designed, compare_pathkeys was ungodly > expensive, which is why the code tries to compare costs first. > We've since introduced the "canonical pathk

Re: [HACKERS] add_path optimization

2009-02-27 Thread Tom Lane
Robert Haas writes: >> I first tried just the compare_fuzzy_path_costs() change and really >> couldn't measure any reliable difference.  oprofile output for CVS HEAD >> looks like > Well, there's obviously something different between your case and > mine, because in my query add_path was the #1 C

Re: [HACKERS] add_path optimization

2009-02-27 Thread Robert Haas
Thanks for taking a look at it. > I first tried just the compare_fuzzy_path_costs() change and really > couldn't measure any reliable difference.  oprofile output for CVS HEAD > looks like Well, there's obviously something different between your case and mine, because in my query add_path was the

Re: [HACKERS] add_path optimization

2009-02-27 Thread Tom Lane
[ This patch is on the 2009-First commitfest list, but there was earlier discussion to the effect that it might be a good idea to include in 8.4, so I made some time to look at it. ] Robert Haas writes: > I've been doing some benchmarking and profiling on the PostgreSQL > query analyzer, and it s

Re: [HACKERS] add_path optimization

2009-02-27 Thread Tom Lane
After a lot of distractions, I've finished applying the planner fixes that seem necessary in view of your report about poorer planning in 8.4 than 8.3. When you have a chance, it would be useful to do a thorough test of CVS HEAD on your data and query mix --- are there any other places where we ha

Re: [HACKERS] add_path optimization

2009-02-09 Thread Kevin Grittner
>>> Bruce Momjian wrote: > Where are we on this: the original patch, and Kevin's slow queries? Robert's patch is not the cause of the 8.4 problems with my queries, and (as Robert pointed out) a separate thread has been started to discuss those issues. >From my perspective, Robert's patch has i

Re: [HACKERS] add_path optimization

2009-02-07 Thread Robert Haas
On Sat, Feb 7, 2009 at 2:44 PM, Bruce Momjian wrote: >> Yikes! The impact of the patch is about what I'd expect, but the fact >> that planning time has nearly tripled is... way poor. Can you repost >> the query and the EXPLAIN output for 8.3.5 and CVS HEAD? > > Where are we on this: the original

Re: [HACKERS] add_path optimization

2009-02-07 Thread Bruce Momjian
Robert Haas wrote: > On Mon, Feb 2, 2009 at 8:10 PM, Kevin Grittner > wrote: > Robert Haas wrote: > >> running this 5 times each on several queries, > >> dropping top and bottom results. > > > > Running a complex query (posted in previous threads, runs about > > 300,000 time per day in a pro

Re: [HACKERS] add_path optimization

2009-02-04 Thread Kevin Grittner
>>> Tom Lane wrote: > That doesn't seem to be the only issue in your example, but it's > definitely one of 'em. FWIW, the pattern causing the problem here is a pretty common one in court business logic: join (often outer join or an exists test) to a set of candidate rows WHERE NOT EXISTS the sa

Re: [HACKERS] add_path optimization

2009-02-04 Thread Tom Lane
I wrote: > Now, since I'm running without any stats, it might be that it's > estimating similar costs for the one-key and two-key merges; but I don't > see why that would happen for you. Off to do some debugging. ... well, actually, it's because I blew off applying any cost correction for this ca

Re: [HACKERS] add_path optimization

2009-02-04 Thread Kevin Grittner
>>> Tom Lane wrote: > Hmm ... one of the things that struck me as odd was that it was doing a > merge join on just the countyNo, which is presumably very far from > unique. There are 72 possible values for any columns in that domain. In most large tables, Milwaukee County (value 40) is used in

Re: [HACKERS] add_path optimization

2009-02-04 Thread Tom Lane
"Kevin Grittner" writes: > Tom Lane wrote: >> It would be worth checking whether you get the same plan without >> the patch (you should, but let's check). > Same plan. Yeah, I just managed to reproduce a similar behavior in unpatched HEAD. Now, since I'm running without any stats, it might be

Re: [HACKERS] add_path optimization

2009-02-04 Thread Tom Lane
Robert Haas writes: > Oh, dear. If this turns out to be my bug Tom will kick my ass! Hmm ... one of the things that struck me as odd was that it was doing a merge join on just the countyNo, which is presumably very far from unique. Testing the query here with Kevin's schema but no data, I get

Re: [HACKERS] add_path optimization

2009-02-04 Thread Kevin Grittner
>>> Tom Lane wrote: > It would be worth checking whether you get the same plan without > the patch (you should, but let's check). Same plan. (Robert can relax, and David can forget about those tickets.) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make c

Re: [HACKERS] add_path optimization

2009-02-04 Thread David E. Wheeler
On Feb 4, 2009, at 9:25 AM, Robert Haas wrote: Oh, dear. If this turns out to be my bug Tom will kick my ass! Can I buy tickets to this event? Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/

Re: [HACKERS] add_path optimization

2009-02-04 Thread Robert Haas
On Wed, Feb 4, 2009 at 12:15 PM, Kevin Grittner wrote: Tom Lane wrote: >> there's some sort of logic bug here someplace. > > Keep in mind that this is running with the patch that started this > thread. I didn't try actually running this query on 8.4devel without > the patch. Should I kill

Re: [HACKERS] add_path optimization

2009-02-04 Thread Tom Lane
"Kevin Grittner" writes: > Tom Lane wrote: >> there's some sort of logic bug here someplace. > Keep in mind that this is running with the patch that started this > thread. I didn't try actually running this query on 8.4devel without > the patch. Should I kill this query, revert the software

Re: [HACKERS] add_path optimization

2009-02-04 Thread Kevin Grittner
>>> Tom Lane wrote: > there's some sort of logic bug here someplace. Keep in mind that this is running with the patch that started this thread. I didn't try actually running this query on 8.4devel without the patch. Should I kill this query, revert the software to pre-patch, and try it again

Re: [HACKERS] add_path optimization

2009-02-04 Thread Tom Lane
"Kevin Grittner" writes: > Well, of the cases which are selected based on the other criteria, > there would be about one CaseDispo row each. The main selection > criterion is the Party.searchName, with various security limitations > added. Where one or more CaseDispo rows exist (it's only includ

Re: [HACKERS] add_path optimization

2009-02-04 Thread Kevin Grittner
>>> Tom Lane wrote: > SELECT ... FROM "CaseDispo" "CD" > WHERE (NOT (EXISTS > ( > SELECT > 1 > FROM > "CaseDispo" "CD2" > WHERE ( > ("CD2"."caseNo" = "CD"."caseNo") > AND ("CD2"."countyNo" = "CD"."countyNo") > AND ("CD2"."dispoDate" > "CD"."dispoDate")) > ))) > > which in fact the planner though

Re: [HACKERS] add_path optimization

2009-02-04 Thread Tom Lane
"Kevin Grittner" writes: > Tom Lane wrote: >> If you can attach to the backend with gdb, try "bt", then "cont", >> then wait a few seconds, then control-C and "bt" again. Repeat >> five or ten times and see if there's any consistency to the traces. > Attached. Hmm, it seems to be spending al

Re: [HACKERS] add_path optimization

2009-02-04 Thread Kevin Grittner
>>> Tom Lane wrote: > If you can attach to the backend with gdb, try "bt", then "cont", > then wait a few seconds, then control-C and "bt" again. Repeat > five or ten times and see if there's any consistency to the traces. Attached. -Kevin kgri...@athena:~> gdb /usr/local/pgsql-8.4devel-200

Re: [HACKERS] add_path optimization

2009-02-04 Thread Tom Lane
"Kevin Grittner" writes: > It's been about 23 hours and it's still running. No apparent memory > leakage. No significant disk activity. One CPU pegged (of the 16 on > the machine). Hmmm ... I wonder if it could be stuck in an infinite loop? It's probably just a horribly bad plan choice, but .

Re: [HACKERS] add_path optimization

2009-02-04 Thread Jonah H. Harris
On Wed, Feb 4, 2009 at 10:12 AM, Greg Stark wrote: > On Wed, Feb 4, 2009 at 3:07 PM, Kevin Grittner > > It's been about 23 hours and it's still running. > > @snide(Gee, it sure would be nice if we continued with that > explain-in-progress patch I had sent in earlier...:) Agreed. -- Jonah H. H

Re: [HACKERS] add_path optimization

2009-02-04 Thread Greg Stark
On Wed, Feb 4, 2009 at 3:07 PM, Kevin Grittner > It's been about 23 hours and it's still running. @snide(Gee, it sure would be nice if we continued with that explain-in-progress patch I had sent in earlier...:) -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To mak

Re: [HACKERS] add_path optimization

2009-02-04 Thread Kevin Grittner
>>> I wrote: Tom Lane wrote: >> Can you let it run to completion? Without explain analyze results >> it's going to be pretty difficult to isolate the problem. > > Barring some currently unforseen need to switch it into service to > back the web site, yes. It's been about 23 hours and i

Re: [HACKERS] add_path optimization

2009-02-03 Thread Kevin Grittner
>>> Tom Lane wrote: > Can you let it run to completion? Without explain analyze results > it's going to be pretty difficult to isolate the problem. Barring some currently unforseen need to switch it into service to back the web site, yes. -Kevin -- Sent via pgsql-hackers mailing list (pgsq

Re: [HACKERS] add_path optimization

2009-02-03 Thread Tom Lane
"Kevin Grittner" writes: > It's now been an hour and 30 minutes; so, while 8.4 does a much better > job of estimating how many rows will be returned, the plan it > generates is much slower for this query. Can you let it run to completion? Without explain analyze results it's going to be pretty d

Re: [HACKERS] add_path optimization

2009-02-03 Thread Kevin Grittner
>>> I wrote: > I tried the same run on 8.4devel and it is still running after > 20 minutes. I will let it cook for a while. It's now been an hour and 30 minutes; so, while 8.4 does a much better job of estimating how many rows will be returned, the plan it generates is much slower for this quer

Re: [HACKERS] add_path optimization

2009-02-03 Thread Kevin Grittner
>>> "Kevin Grittner" wrote: > Attached is EXPLAIN > ANALYZE output from the 8.3.5 database I dumped from for my 8.4devel > tests. Actually, that one is from the sibling machine which is in production. Attached is the one on standby where I've been running the rest of this. Apparently, differ

Re: [HACKERS] add_path optimization

2009-02-03 Thread Kevin Grittner
>>> Robert Haas wrote: > I had a suspicion we were going to see something like this. You're > using several NOT EXISTS clauses and 8.4devel is converting those into > Anti Joins. Aside from the longer planning time, the resulting plan > appears to have a much higher estimated cost, so I'm suspic

Re: [HACKERS] add_path optimization

2009-02-03 Thread Tom Lane
Robert Haas writes: > I had a suspicion we were going to see something like this. You're > using several NOT EXISTS clauses and 8.4devel is converting those into > Anti Joins. Aside from the longer planning time, the resulting plan > appears to have a much higher estimated cost, so I'm suspicious

Re: [HACKERS] add_path optimization

2009-02-03 Thread Tom Lane
"Kevin Grittner" writes: > [ test case ] It looks to me like the reason for the planning time difference is that this query contains four NOT EXISTS subqueries, which 8.3 was not very smart about but 8.4 has converted into antijoins. That gives it more flexibility to consider different join orde

Re: [HACKERS] add_path optimization

2009-02-03 Thread Robert Haas
On Tue, Feb 3, 2009 at 10:17 AM, Kevin Grittner wrote: >> We're going to need to see the test case, because I don't see that in >> some simple tests here. > Plans from 8.3.5 and 8.4devel attached. > > If you need something else, let me know. I had a suspicion we were going to see something like t

Re: [HACKERS] add_path optimization

2009-02-03 Thread Kevin Grittner
>>> I wrote: > Ran it with this: > effective_cache_size = 100GB Actually, the timings shown in the previous post were run with the default for this setting. I updated it after yesterday evening's tests when I noticed I'd missed it, but had to leave before I could rerun the tests. I forgot th

Re: [HACKERS] add_path optimization

2009-02-03 Thread Kevin Grittner
>>> Robert Haas wrote: > FYI, I retested my queries on REL8_3_STABLE and the results were not > all that different from CVS HEAD. So the problem is apparently > specific to something your query is doing that mine isn't., rather > than a general slowdown in planning (or else one of us goofed up t

Re: [HACKERS] add_path optimization

2009-02-03 Thread Kevin Grittner
>>> Tom Lane wrote: > In fact, the only reason to care whether there is any data in the DB > *at all* is that you need some realistic content in pg_statistic. > So it should be possible to set up a planner test DB with very little > data bulk, which would surely make testing a lot less painful.

Re: [HACKERS] add_path optimization

2009-02-02 Thread Tom Lane
Robert Haas writes: > I'm interested, but I need maybe a 1GB data set, or smaller. The > thing that we are benchmarking is the planner, and planning times are > related to the complexity of the database and the accompanying > queries, not the raw volume of data. In fact, the only reason to care

Re: [HACKERS] add_path optimization

2009-02-02 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote: > I'm interested, but I need maybe a 1GB data set, or smaller. The > thing that we are benchmarking is the planner, and planning times are > related to the complexity of the database and the accompanying > queries, not the raw volume of data. (It's not

Re: [HACKERS] add_path optimization

2009-02-02 Thread Robert Haas
> A good data set, plus complex queries against it, might be the data from > the US Census, specifically the TIGER data and the TIGER geocoder. I've > been following this thread with the intention of putting together a > large-data test set, but I just havn't found the time to yet. Right now > th

Re: [HACKERS] add_path optimization

2009-02-02 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Robert Haas writes: > > Yikes! The impact of the patch is about what I'd expect, but the fact > > that planning time has nearly tripled is... way poor. > > We're going to need to see the test case, because I don't see that in > some simple tests here. A

Re: [HACKERS] add_path optimization

2009-02-02 Thread Tom Lane
Robert Haas writes: > Yikes! The impact of the patch is about what I'd expect, but the fact > that planning time has nearly tripled is... way poor. We're going to need to see the test case, because I don't see that in some simple tests here. regards, tom lane -- Sent v

Re: [HACKERS] add_path optimization

2009-02-02 Thread Robert Haas
>> Running a complex query (posted in previous threads, runs about >> 300,000 time per day in a production web application), I got these >> timings on a production quality machine (4 quad CPU chips, that is 16 >> CPUs like this: Intel(R) Xeon(R) CPU X7350 @ 2.93GHz, 128 GB RAM, big >> RAID with BB

Re: [HACKERS] add_path optimization

2009-02-02 Thread Robert Haas
On Mon, Feb 2, 2009 at 8:10 PM, Kevin Grittner wrote: Robert Haas wrote: >> running this 5 times each on several queries, >> dropping top and bottom results. > > Running a complex query (posted in previous threads, runs about > 300,000 time per day in a production web application), I got the

Re: [HACKERS] add_path optimization

2009-02-02 Thread Kevin Grittner
>>> Robert Haas wrote: > running this 5 times each on several queries, > dropping top and bottom results. Running a complex query (posted in previous threads, runs about 300,000 time per day in a production web application), I got these timings on a production quality machine (4 quad CPU chips,

Re: [HACKERS] add_path optimization

2009-02-02 Thread Grzegorz Jaskiewicz
On 2 Feb 2009, at 14:50, Robert Haas wrote: well, true - but also, statically allocated table, without any predefined size (with #DEFINE) , and no boundary check - is bad as well. I suppose , this code is easy enough to let it be with your changes, but I would still call it not pretty. W

Re: [HACKERS] add_path optimization

2009-02-02 Thread Robert Haas
> well, true - but also, statically allocated table, without any predefined > size (with #DEFINE) , and no boundary check - is bad as well. > I suppose , this code is easy enough to let it be with your changes, but I > would still call it not pretty. Well, it might merit a comment. > Actually - i

Re: [HACKERS] add_path optimization

2009-02-02 Thread Grzegorz Jaskiewicz
On 1 Feb 2009, at 21:35, Robert Haas wrote: On Sun, Feb 1, 2009 at 3:25 PM, Grzegorz Jaskiewicz > wrote: I don't like the fact that you hardcoded that here. I know that you are trying to pass on few calls in one go here, but still... ugly. Well, I think you'll find that using a dynamically

Re: [HACKERS] add_path optimization

2009-02-01 Thread Robert Haas
On Sun, Feb 1, 2009 at 3:25 PM, Grzegorz Jaskiewicz wrote: > I don't like the fact that you hardcoded that here. I know that you are > trying to pass on few calls in one go here, but still... ugly. Well, I think you'll find that using a dynamically sized data structure destroys the possibility o

Re: [HACKERS] add_path optimization

2009-02-01 Thread Grzegorz Jaskiewicz
disclaimer: I don't know that bit of postgresql code, in fact - this is the first time I see it. *** a/src/backend/optimizer/path/joinpath.c --- b/src/backend/optimizer/path/joinpath.c *** *** 473,478 match_unsorted_outer(PlannerInfo *root, --- 473,481 if

Re: [HACKERS] add_path optimization

2009-02-01 Thread Robert Haas
On Sun, Feb 1, 2009 at 1:34 PM, Jaime Casanova wrote: > On Sat, Jan 31, 2009 at 11:37 PM, Robert Haas wrote: >> I've been doing some benchmarking and profiling on the PostgreSQL >> query analyzer, and it seems that (at least for the sorts of queries >> that I typically run) the dominant cost is a

Re: [HACKERS] add_path optimization

2009-02-01 Thread Jaime Casanova
On Sat, Jan 31, 2009 at 11:37 PM, Robert Haas wrote: > I've been doing some benchmarking and profiling on the PostgreSQL > query analyzer, and it seems that (at least for the sorts of queries > that I typically run) the dominant cost is add_path(). I've been able > to find two optimizations that

Re: [HACKERS] add_path optimization

2009-02-01 Thread Robert Haas
On Sun, Feb 1, 2009 at 12:03 PM, David Fetter wrote: > On Sat, Jan 31, 2009 at 11:37:39PM -0500, Robert Haas wrote: >> I've been doing some benchmarking and profiling on the PostgreSQL >> query analyzer, and it seems that (at least for the sorts of queries >> that I typically run) the dominant cos

Re: [HACKERS] add_path optimization

2009-02-01 Thread David Fetter
On Sat, Jan 31, 2009 at 11:37:39PM -0500, Robert Haas wrote: > I've been doing some benchmarking and profiling on the PostgreSQL > query analyzer, and it seems that (at least for the sorts of queries > that I typically run) the dominant cost is add_path(). I've been > able to find two optimization

[HACKERS] add_path optimization

2009-01-31 Thread Robert Haas
I've been doing some benchmarking and profiling on the PostgreSQL query analyzer, and it seems that (at least for the sorts of queries that I typically run) the dominant cost is add_path(). I've been able to find two optimizations that seem to help significantly: 1. add_path() often calls compare