Re: [HACKERS] A costing analysis tool

2005-10-19 Thread Josh Berkus
Kevin, > This would require capture of information beyond what I was thinking > about in terms of schema. Do you think we need to capture just index > type, or something more? Do you propose that we capture the pg_* > metadata related to every object referenced in the plan, every object > relate

Re: [HACKERS] A costing analysis tool

2005-10-19 Thread Kevin Grittner
This would require capture of information beyond what I was thinking about in terms of schema. Do you think we need to capture just index type, or something more? Do you propose that we capture the pg_* metadata related to every object referenced in the plan, every object related to every table i

Re: [HACKERS] A costing analysis tool

2005-10-19 Thread Andrew Dunstan
Robert Treat wrote: On Tuesday 18 October 2005 18:05, Kevin Grittner wrote: Regarding the idea of a site where results could be posted and loaded into a database which would be available for public access -- I agree that would be great; however, my client is not willing to take that on. I

Re: [HACKERS] A costing analysis tool

2005-10-19 Thread Kevin Grittner
Maybe we could associate a set of defaults to runtime_environment, and you would associate any overrides with the runtime_options. Does this address both your concerns? >>> Josh Berkus >>> Kevin, > When it gets downt to the detail, it may make sense to combine > or split some of these. For ex

Re: [HACKERS] A costing analysis tool

2005-10-19 Thread Robert Treat
On Tuesday 18 October 2005 18:05, Kevin Grittner wrote: > Regarding the idea of a site where results could be posted > and loaded into a database which would be available for > public access -- I agree that would be great; however, my > client is not willing to take that on. If anyone wants to > v

Re: [HACKERS] A costing analysis tool

2005-10-19 Thread Kevin Grittner
I'm not interested in storing less information. I'm trying to make sure that all redundant information is justified. Since I plan to store the actual query text and the full EXPLAIN ANALYZE output, every column I pull out and put in another table is redundant data. The questions are, why do we h

Re: [HACKERS] A costing analysis tool

2005-10-19 Thread Josh Berkus
Kevin, > When it gets downt to the detail, it may make sense to combine > or split some of these. For example, runtime_options should > probably not have a column for each currently known option, > but a child table which maps to all non-default option values. I'm a little cautious about storing

Re: [HACKERS] A costing analysis tool

2005-10-19 Thread Kevin Grittner
Summary of schema I'm considering. Comments welcome. When it gets downt to the detail, it may make sense to combine or split some of these. For example, runtime_options should probably not have a column for each currently known option, but a child table which maps to all non-default option value

Re: [HACKERS] A costing analysis tool

2005-10-19 Thread Josh Berkus
Kevin, > If we stored the actual queries and the EXPLAIN ANALYZE results (when > generated) in the database, what would be the purpose of the node_name, > db_object, and condition_detail columns? They don't seem like they > would be useful for statistical analysis, and it seems like the > informa

Re: [HACKERS] A costing analysis tool

2005-10-19 Thread Kevin Grittner
If we stored the actual queries and the EXPLAIN ANALYZE results (when generated) in the database, what would be the purpose of the node_name, db_object, and condition_detail columns? They don't seem like they would be useful for statistical analysis, and it seems like the information would be more

Re: [HACKERS] A costing analysis tool

2005-10-18 Thread Kevin Grittner
Thanks to all who have been offering suggestions. I have been reading them and will try to incorporate as much as possible. I have already reworked that little brain-dead python script into something which uses a regular expression to pick off all of the data from each cost/timing line (including

Re: [HACKERS] A costing analysis tool

2005-10-17 Thread Jim C. Nasby
On Fri, Oct 14, 2005 at 03:34:43PM -0500, Kevin Grittner wrote: > of the two times as a reliability factor. Unfortunately, that > means doubling the number of cache flushes, which is likely > to be the most time-consuming part of running the tests. On > the bright side, we would capture the top l

Re: [HACKERS] A costing analysis tool

2005-10-17 Thread Jim C. Nasby
On Sat, Oct 15, 2005 at 04:04:52PM +0200, Martijn van Oosterhout wrote: > On Fri, Oct 14, 2005 at 03:34:43PM -0500, Kevin Grittner wrote: > > Of course, if running with EXPLAIN ANALYZE significantly > > distorts the run time, the whole effort is doomed at the outset. > > Can you quantify the distor

Re: [HACKERS] A costing analysis tool

2005-10-17 Thread Jim C. Nasby
On Fri, Oct 14, 2005 at 02:37:37PM -0400, Tom Lane wrote: > "Kevin Grittner" <[EMAIL PROTECTED]> writes: > > I propose capturing only three values from the output of explain > > analyze, and saving it with many columns of context information. > > You really have to capture the rowcounts (est and a

Re: [HACKERS] A costing analysis tool

2005-10-17 Thread Jim C. Nasby
On Sun, Oct 16, 2005 at 02:23:41PM -0700, Josh Berkus wrote: > Greg, > > > Or something like that. It might require breaking random_page_cost into two > > or three different parameters that would normally have the same cost but > > aren't handled the same, like random_heap_cost, random_leaf_cost,

Re: [HACKERS] A costing analysis tool

2005-10-17 Thread Jim C. Nasby
On Fri, Oct 14, 2005 at 05:14:43PM +0200, Martijn van Oosterhout wrote: > On Thu, Oct 13, 2005 at 05:39:32PM -0500, Kevin Grittner wrote: > > That said, there's certainly overlap between your effort and > > what I'm going to be developing. Do you have anything from > > your work which might save m

Re: [HACKERS] A costing analysis tool

2005-10-16 Thread Greg Stark
Josh Berkus writes: > Greg, > > > Or something like that. It might require breaking random_page_cost into two > > or three different parameters that would normally have the same cost but > > aren't handled the same, like random_heap_cost, random_leaf_cost, and > > random_nonleaf_cost. > > Gods

Re: [HACKERS] A costing analysis tool

2005-10-16 Thread Josh Berkus
Kevin, > I have to keep a very narrow focus on this, or there is likely that > nothing will come of it.  The particular area which is my target > here is the accuracy of the cost values on the subplans > considered by the optimizer. Sure. What the rest of us are focused on is helping you build a

Re: [HACKERS] A costing analysis tool

2005-10-16 Thread Josh Berkus
Greg, > Or something like that. It might require breaking random_page_cost into two > or three different parameters that would normally have the same cost but > aren't handled the same, like random_heap_cost, random_leaf_cost, and > random_nonleaf_cost. Gods forbid. People don't know how to use

Re: [HACKERS] A costing analysis tool

2005-10-16 Thread Martijn van Oosterhout
On Sat, Oct 15, 2005 at 06:48:25PM -0400, Tom Lane wrote: > > So, what's good for battery and power usage is bad for accurate > > timings. Basically, on Linux it would seriously underestimate the time > > for blocking system calls on an otherwise idle system. So, it works for > > Windows because th

Re: [HACKERS] A costing analysis tool

2005-10-15 Thread Tom Lane
Martijn van Oosterhout writes: > Interestingly, I notice the windows port of PostgreSQL uses the > QueryPerformanceCounter() function. I tried playing with it under linux > and found that Linux suspends the CPU while waiting for things to > happen. So: > sleep(1)~ 20 million cycle

Re: [HACKERS] A costing analysis tool

2005-10-15 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes: > Hardly --- how will you choose the best subplans if you don't calculate > their costs? Uhm, good point. but as you say not really a problem. > I'm also a bit suspicious of the "it's all a linear equation" premise, > because the fact of the matter is that th

Re: [HACKERS] A costing analysis tool

2005-10-15 Thread Martijn van Oosterhout
On Sat, Oct 15, 2005 at 05:53:45PM -0400, Greg Stark wrote: > Martijn van Oosterhout writes: > > > This is unfortunate because EXPLAIN ANALYZE is an immensly useful tool, > > as far as it goes. I've pondered if some kind of userspace timing > > mechanism could be introduced (possibly using builti

Re: [HACKERS] A costing analysis tool

2005-10-15 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes: > If the optimizer didn't collapse the cost for each node into a single value > and instead retained the individual parameters at each node it could bubble > those values all the way up to the surface. Then use the configuration options > like random_page_cost

Re: [HACKERS] A costing analysis tool

2005-10-15 Thread Greg Stark
Martijn van Oosterhout writes: > This is unfortunate because EXPLAIN ANALYZE is an immensly useful tool, > as far as it goes. I've pondered if some kind of userspace timing > mechanism could be introduced (possibly using builtin CPU cycle > counters) to reduce the cost. It does, however, remain a

Re: [HACKERS] A costing analysis tool

2005-10-15 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes: > More generally, I think that depending entirely on EXPLAIN ANALYZE > numbers is a bad idea, because the overhead of EXPLAIN ANALYZE is both > significant and variable depending on the plan structure. The numbers > that I think we must capture are the top-le

Re: [HACKERS] A costing analysis tool

2005-10-15 Thread Martijn van Oosterhout
On Fri, Oct 14, 2005 at 03:34:43PM -0500, Kevin Grittner wrote: > Of course, if running with EXPLAIN ANALYZE significantly > distorts the run time, the whole effort is doomed at the outset. > Can you quantify the distortion you mention? Do you know To do the calculations for EXPLAIN ANALYZE, Post

Re: [HACKERS] A costing analysis tool

2005-10-14 Thread Tom Lane
"Kevin Grittner" <[EMAIL PROTECTED]> writes: > Dang. Obviously, that's inverted. Also, I'd need to > factor in the setup time. Bother. Are you sure we can't > just make sure the test scripts operate against tables > with accurate statistics? Well, the point of my comment was that you should re

Re: [HACKERS] A costing analysis tool

2005-10-14 Thread Kevin Grittner
Dang. Obviously, that's inverted. Also, I'd need to factor in the setup time. Bother. Are you sure we can't just make sure the test scripts operate against tables with accurate statistics? >>> "Kevin Grittner" <[EMAIL PROTECTED]> 10/14/05 3:34 PM >>> The ratio I've been looking at should perha

Re: [HACKERS] A costing analysis tool

2005-10-14 Thread Kevin Grittner
I have to keep a very narrow focus on this, or there is likely that nothing will come of it. The particular area which is my target here is the accuracy of the cost values on the subplans considered by the optimizer. As previously stated, we're getting hurt by cases where the optimizer looks at t

Re: [HACKERS] A costing analysis tool

2005-10-14 Thread Tom Lane
"Kevin Grittner" <[EMAIL PROTECTED]> writes: > I propose capturing only three values from the output of explain > analyze, and saving it with many columns of context information. You really have to capture the rowcounts (est and actual) too. Otherwise you can't tell if it's a costing problem or a

Re: [HACKERS] A costing analysis tool

2005-10-14 Thread Kevin Grittner
I think I get your point now. If I understand it, you could accomplish what you want under my rough ("exists only in my head so far") design by creating your own test cases and putting together a script to run just those. I would be exremely leary of comparing tests against a database under load

Re: [HACKERS] A costing analysis tool

2005-10-14 Thread Josh Berkus
Kevin, > It sounds as though you are more focused on picking up costing > problems which happen during production -- which is clearly > valuable, but addresses a somewhat different set of needs than > I was looking at. That said, it seems like there is potential to share > signifcant code between

Re: [HACKERS] A costing analysis tool

2005-10-14 Thread Martijn van Oosterhout
On Thu, Oct 13, 2005 at 05:39:32PM -0500, Kevin Grittner wrote: > That said, there's certainly overlap between your effort and > what I'm going to be developing. Do you have anything from > your work which might save me some time? Not really. I got stuck in the query design phase. I didn't even g

Re: [HACKERS] A costing analysis tool

2005-10-13 Thread Kevin Grittner
Thanks for the well wishes. It sounds like you were addressing a slightly different problem -- more ambitious than what I propose tackle do as a first step. If I understand you, you were trying to develop your own predictive costing formulas based on plans. I'm merely talking about a tool to eval

Re: [HACKERS] A costing analysis tool

2005-10-13 Thread Martijn van Oosterhout
On Thu, Oct 13, 2005 at 01:52:10PM -0500, Kevin Grittner wrote: > Thanks, Josh, for the feedback. > > It sounds as though you are more focused on picking up costing > problems which happen during production -- which is clearly > valuable, but addresses a somewhat different set of needs than > I wa

Re: [HACKERS] A costing analysis tool

2005-10-13 Thread Kevin Grittner
Thanks, Josh, for the feedback. It sounds as though you are more focused on picking up costing problems which happen during production -- which is clearly valuable, but addresses a somewhat different set of needs than I was looking at. That said, it seems like there is potential to share signifca

Re: [HACKERS] A costing analysis tool

2005-10-13 Thread Josh Berkus
Kevin, I'm looking at trying to fix some clear flaws in costing which cause of our real-world queries to choose sub-optimal plans under PostgreSQL. It's clear that there needs to be a tool to analyze the accuracy of costing for a variety of queries, both to direct any efforts to fix problems and

Re: [HACKERS] A costing analysis tool

2005-10-13 Thread Kevin Grittner
Ouch! I just remembered locale and character sets and encoding. I can't even begin to get my head around what to do with those, unless it is just to make the tool agnostic regarding those issues and test against a variety of setups. Does that seem adequate? I flash back to my first attempts to u

Re: [HACKERS] A costing analysis tool

2005-10-12 Thread Kevin Grittner
Good points, Tom. (I wish my client's email software supported quoting so that I could post replies closer to your points. Sorry 'bout that.) I tried searching the archives, though, and the words I could think to search with generated so many hits that it seemed more or less like a sequential se

Re: [HACKERS] A costing analysis tool

2005-10-12 Thread Kevin Grittner
Yes I have looked at the TODO list. There is arguably a relationship to: * Have EXPLAIN ANALYZE highlight poor optimizer estimates * Log queries where the optimizer row estimates were dramatically different from the number of rows actually found? Neither of these, however, provides a systemat

Re: [HACKERS] A costing analysis tool

2005-10-12 Thread Tom Lane
"Kevin Grittner" <[EMAIL PROTECTED]> writes: > Note that I'm talking about a tool strictly to check the accuracy of > the estimated costs of plans chosen by the planner, nothing else. We could definitely do with some infrastructure for testing this. I concur with Bruce's suggestion that you should

Re: [HACKERS] A costing analysis tool

2005-10-12 Thread Bruce Momjian
Have you looked at the TODO list to see our previous ideas on tuning diagnotics? --- Kevin Grittner wrote: > I'm looking at trying to fix some clear flaws in costing which cause > of our real-world queries to choose sub-opti

[HACKERS] A costing analysis tool

2005-10-12 Thread Kevin Grittner
I'm looking at trying to fix some clear flaws in costing which cause of our real-world queries to choose sub-optimal plans under PostgreSQL. It's clear that there needs to be a tool to analyze the accuracy of costing for a variety of queries, both to direct any efforts to fix problems and to test f