Kevin,
> This would require capture of information beyond what I was thinking
> about in terms of schema. Do you think we need to capture just index
> type, or something more? Do you propose that we capture the pg_*
> metadata related to every object referenced in the plan, every object
> relate
This would require capture of information beyond what I was thinking
about in terms of schema. Do you think we need to capture just index
type, or something more? Do you propose that we capture the pg_*
metadata related to every object referenced in the plan, every object
related to every table i
Robert Treat wrote:
On Tuesday 18 October 2005 18:05, Kevin Grittner wrote:
Regarding the idea of a site where results could be posted
and loaded into a database which would be available for
public access -- I agree that would be great; however, my
client is not willing to take that on. I
Maybe we could associate a set of defaults to runtime_environment,
and you would associate any overrides with the runtime_options.
Does this address both your concerns?
>>> Josh Berkus >>>
Kevin,
> When it gets downt to the detail, it may make sense to combine
> or split some of these. For ex
On Tuesday 18 October 2005 18:05, Kevin Grittner wrote:
> Regarding the idea of a site where results could be posted
> and loaded into a database which would be available for
> public access -- I agree that would be great; however, my
> client is not willing to take that on. If anyone wants to
> v
I'm not interested in storing less information. I'm trying to make sure
that all redundant information is justified. Since I plan to store the
actual query text and the full EXPLAIN ANALYZE output, every
column I pull out and put in another table is redundant data. The
questions are, why do we h
Kevin,
> When it gets downt to the detail, it may make sense to combine
> or split some of these. For example, runtime_options should
> probably not have a column for each currently known option,
> but a child table which maps to all non-default option values.
I'm a little cautious about storing
Summary of schema I'm considering. Comments welcome.
When it gets downt to the detail, it may make sense to combine
or split some of these. For example, runtime_options should
probably not have a column for each currently known option,
but a child table which maps to all non-default option value
Kevin,
> If we stored the actual queries and the EXPLAIN ANALYZE results (when
> generated) in the database, what would be the purpose of the node_name,
> db_object, and condition_detail columns? They don't seem like they
> would be useful for statistical analysis, and it seems like the
> informa
If we stored the actual queries and the EXPLAIN ANALYZE results (when
generated) in the database, what would be the purpose of the node_name,
db_object, and condition_detail columns? They don't seem like they
would be useful for statistical analysis, and it seems like the
information would be more
Thanks to all who have been offering suggestions. I have been
reading them and will try to incorporate as much as possible.
I have already reworked that little brain-dead python script into
something which uses a regular expression to pick off all of the
data from each cost/timing line (including
On Fri, Oct 14, 2005 at 03:34:43PM -0500, Kevin Grittner wrote:
> of the two times as a reliability factor. Unfortunately, that
> means doubling the number of cache flushes, which is likely
> to be the most time-consuming part of running the tests. On
> the bright side, we would capture the top l
On Sat, Oct 15, 2005 at 04:04:52PM +0200, Martijn van Oosterhout wrote:
> On Fri, Oct 14, 2005 at 03:34:43PM -0500, Kevin Grittner wrote:
> > Of course, if running with EXPLAIN ANALYZE significantly
> > distorts the run time, the whole effort is doomed at the outset.
> > Can you quantify the distor
On Fri, Oct 14, 2005 at 02:37:37PM -0400, Tom Lane wrote:
> "Kevin Grittner" <[EMAIL PROTECTED]> writes:
> > I propose capturing only three values from the output of explain
> > analyze, and saving it with many columns of context information.
>
> You really have to capture the rowcounts (est and a
On Sun, Oct 16, 2005 at 02:23:41PM -0700, Josh Berkus wrote:
> Greg,
>
> > Or something like that. It might require breaking random_page_cost into two
> > or three different parameters that would normally have the same cost but
> > aren't handled the same, like random_heap_cost, random_leaf_cost,
On Fri, Oct 14, 2005 at 05:14:43PM +0200, Martijn van Oosterhout wrote:
> On Thu, Oct 13, 2005 at 05:39:32PM -0500, Kevin Grittner wrote:
> > That said, there's certainly overlap between your effort and
> > what I'm going to be developing. Do you have anything from
> > your work which might save m
Josh Berkus writes:
> Greg,
>
> > Or something like that. It might require breaking random_page_cost into two
> > or three different parameters that would normally have the same cost but
> > aren't handled the same, like random_heap_cost, random_leaf_cost, and
> > random_nonleaf_cost.
>
> Gods
Kevin,
> I have to keep a very narrow focus on this, or there is likely that
> nothing will come of it. The particular area which is my target
> here is the accuracy of the cost values on the subplans
> considered by the optimizer.
Sure. What the rest of us are focused on is helping you build a
Greg,
> Or something like that. It might require breaking random_page_cost into two
> or three different parameters that would normally have the same cost but
> aren't handled the same, like random_heap_cost, random_leaf_cost, and
> random_nonleaf_cost.
Gods forbid. People don't know how to use
On Sat, Oct 15, 2005 at 06:48:25PM -0400, Tom Lane wrote:
> > So, what's good for battery and power usage is bad for accurate
> > timings. Basically, on Linux it would seriously underestimate the time
> > for blocking system calls on an otherwise idle system. So, it works for
> > Windows because th
Martijn van Oosterhout writes:
> Interestingly, I notice the windows port of PostgreSQL uses the
> QueryPerformanceCounter() function. I tried playing with it under linux
> and found that Linux suspends the CPU while waiting for things to
> happen. So:
> sleep(1)~ 20 million cycle
Tom Lane <[EMAIL PROTECTED]> writes:
> Hardly --- how will you choose the best subplans if you don't calculate
> their costs?
Uhm, good point. but as you say not really a problem.
> I'm also a bit suspicious of the "it's all a linear equation" premise,
> because the fact of the matter is that th
On Sat, Oct 15, 2005 at 05:53:45PM -0400, Greg Stark wrote:
> Martijn van Oosterhout writes:
>
> > This is unfortunate because EXPLAIN ANALYZE is an immensly useful tool,
> > as far as it goes. I've pondered if some kind of userspace timing
> > mechanism could be introduced (possibly using builti
Greg Stark <[EMAIL PROTECTED]> writes:
> If the optimizer didn't collapse the cost for each node into a single value
> and instead retained the individual parameters at each node it could bubble
> those values all the way up to the surface. Then use the configuration options
> like random_page_cost
Martijn van Oosterhout writes:
> This is unfortunate because EXPLAIN ANALYZE is an immensly useful tool,
> as far as it goes. I've pondered if some kind of userspace timing
> mechanism could be introduced (possibly using builtin CPU cycle
> counters) to reduce the cost. It does, however, remain a
Tom Lane <[EMAIL PROTECTED]> writes:
> More generally, I think that depending entirely on EXPLAIN ANALYZE
> numbers is a bad idea, because the overhead of EXPLAIN ANALYZE is both
> significant and variable depending on the plan structure. The numbers
> that I think we must capture are the top-le
On Fri, Oct 14, 2005 at 03:34:43PM -0500, Kevin Grittner wrote:
> Of course, if running with EXPLAIN ANALYZE significantly
> distorts the run time, the whole effort is doomed at the outset.
> Can you quantify the distortion you mention? Do you know
To do the calculations for EXPLAIN ANALYZE, Post
"Kevin Grittner" <[EMAIL PROTECTED]> writes:
> Dang. Obviously, that's inverted. Also, I'd need to
> factor in the setup time. Bother. Are you sure we can't
> just make sure the test scripts operate against tables
> with accurate statistics?
Well, the point of my comment was that you should re
Dang. Obviously, that's inverted. Also, I'd need to
factor in the setup time. Bother. Are you sure we can't
just make sure the test scripts operate against tables
with accurate statistics?
>>> "Kevin Grittner" <[EMAIL PROTECTED]> 10/14/05 3:34 PM >>>
The ratio I've been looking at should perha
I have to keep a very narrow focus on this, or there is likely that
nothing will come of it. The particular area which is my target
here is the accuracy of the cost values on the subplans
considered by the optimizer. As previously stated, we're getting
hurt by cases where the optimizer looks at t
"Kevin Grittner" <[EMAIL PROTECTED]> writes:
> I propose capturing only three values from the output of explain
> analyze, and saving it with many columns of context information.
You really have to capture the rowcounts (est and actual) too.
Otherwise you can't tell if it's a costing problem or a
I think I get your point now. If I understand it, you could
accomplish what you want under my rough ("exists only in my
head so far") design by creating your own test cases and putting
together a script to run just those. I would be exremely leary of
comparing tests against a database under load
Kevin,
> It sounds as though you are more focused on picking up costing
> problems which happen during production -- which is clearly
> valuable, but addresses a somewhat different set of needs than
> I was looking at. That said, it seems like there is potential to share
> signifcant code between
On Thu, Oct 13, 2005 at 05:39:32PM -0500, Kevin Grittner wrote:
> That said, there's certainly overlap between your effort and
> what I'm going to be developing. Do you have anything from
> your work which might save me some time?
Not really. I got stuck in the query design phase. I didn't even
g
Thanks for the well wishes.
It sounds like you were addressing a slightly different problem --
more ambitious than what I propose tackle do as a first step.
If I understand you, you were trying to develop your own
predictive costing formulas based on plans. I'm merely talking
about a tool to eval
On Thu, Oct 13, 2005 at 01:52:10PM -0500, Kevin Grittner wrote:
> Thanks, Josh, for the feedback.
>
> It sounds as though you are more focused on picking up costing
> problems which happen during production -- which is clearly
> valuable, but addresses a somewhat different set of needs than
> I wa
Thanks, Josh, for the feedback.
It sounds as though you are more focused on picking up costing
problems which happen during production -- which is clearly
valuable, but addresses a somewhat different set of needs than
I was looking at. That said, it seems like there is potential to share
signifca
Kevin,
I'm looking at trying to fix some clear flaws in costing which cause
of our real-world queries to choose sub-optimal plans under PostgreSQL.
It's clear that there needs to be a tool to analyze the accuracy of
costing for a variety of queries, both to direct any efforts to fix
problems and
Ouch! I just remembered locale and character sets and
encoding. I can't even begin to get my head around what to
do with those, unless it is just to make the tool agnostic
regarding those issues and test against a variety of setups.
Does that seem adequate?
I flash back to my first attempts to u
Good points, Tom. (I wish my client's email software supported
quoting so that I could post replies closer to your points. Sorry
'bout that.)
I tried searching the archives, though, and the words I could think
to search with generated so many hits that it seemed more or less
like a sequential se
Yes I have looked at the TODO list. There is arguably a relationship
to:
* Have EXPLAIN ANALYZE highlight poor optimizer estimates
* Log queries where the optimizer row estimates were dramatically
different from the number of rows actually found?
Neither of these, however, provides a systemat
"Kevin Grittner" <[EMAIL PROTECTED]> writes:
> Note that I'm talking about a tool strictly to check the accuracy of
> the estimated costs of plans chosen by the planner, nothing else.
We could definitely do with some infrastructure for testing this.
I concur with Bruce's suggestion that you should
Have you looked at the TODO list to see our previous ideas on tuning
diagnotics?
---
Kevin Grittner wrote:
> I'm looking at trying to fix some clear flaws in costing which cause
> of our real-world queries to choose sub-opti
I'm looking at trying to fix some clear flaws in costing which cause
of our real-world queries to choose sub-optimal plans under PostgreSQL.
It's clear that there needs to be a tool to analyze the accuracy of
costing for a variety of queries, both to direct any efforts to fix
problems and to test f
44 matches
Mail list logo