Jim C. Nasby wrote:
Been suggested before... the problem is actually doing something useful
with all that data that's collected, as well as how to collect it
without greatly impacting the system.
Identifying the best plan by means of actually running multiple plans and timing
them is useful. That would be the point.
As far as "without greatly impacting the system", I don't think that is a real
concern. The whole idea is to greatly impact the system *once*, sometime when
the DBA doesn't mind impacting the system (like before you go live on a
production network, or between midnight and 3 AM, or whatever), and then store
the best plan for future use.
The planner trades-off the desire to find the best plan and the need to find a
plan quickly. It also chooses a plan based on statistics and not based on
actual runtimes (because there is a chicken-and-egg problem: how do you know
which plan has the smallest runtime without running it?), so the chosen plan
that looks best based on statistics might not actually be best.
The idea I'm proposing circumvents the whole trade-off problem by explicitly
choosing to do something that makes the planner run really slowly and take a
really long time. But it doesn't do it "at runtime", in the sense that you
don't do it for each query. You just do it once up front and be done with it.
Of course, this is only useful for people with reasonably static queries and
reasonably static table statistics, so that a good plan found up-front continues
to be a good plan as it is repeatedly used.
My personal motivation is that I have tables whose statistics are quite static.
The data itself changes, but the statistical distribution from which the data
is pulled is unchanging, so the table statistics end up about the same even as
the data itself is added and deleted. On top of that, the planner keeps
choosing the wrong plan, which I know to be true because I can make individual
queries run faster by structuring them in ways that the planner can't see
through and "optimize" away the particular plan that I am effectively giving it.
But this is a PITA for me, especially since I don't always know what the best
plan might be and have to try them all until I find the right one. (With the
added complexity that I can't always figure out how to trick the planner into
choosing a specific plan, and hence can't test it.) It would be *so much
easier* to have an option to tell the planner to try them all.
mark
On Tue, Aug 08, 2006 at 08:23:05AM -0700, Mark Dilger wrote:
If this feature I'm proposing already exists, sorry for the waste of
bandwidth, and could someone please point me to it? :)
What if there were a mode that told postgres to do an exhaustive search (or
if not exhaustive, then much more extensive search) of all plans (or many
plans), trying each plan, reporting the performance of each, and discarding
the query results, much like "explain analyze" does. Postgres could then
dump the best plan in machine readable (and semi-human readable) form which
the planner could parse and use at some later date in lieu of a SQL query.
This would allow people with reasonably static table statistics (where the
best plan is not likely to change) to spend upfront cycles investigating
the best plan and then embed that plan in their business logic. Since the
stored plan is both written-by and read-by postgres, it can get quite
complicated without putting a burden on humans to read and write such
complicated things. It would also remove the risk that the planner will
occasionally (due to its nondeterministic workings) choose a really bad
plan and stall a production system.
mark
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly