Jim C. Nasby wrote:
Been suggested before... the problem is actually doing something useful
with all that data that's collected, as well as how to collect it
without greatly impacting the system.

Identifying the best plan by means of actually running multiple plans and timing them is useful. That would be the point.

As far as "without greatly impacting the system", I don't think that is a real concern. The whole idea is to greatly impact the system *once*, sometime when the DBA doesn't mind impacting the system (like before you go live on a production network, or between midnight and 3 AM, or whatever), and then store the best plan for future use.

The planner trades-off the desire to find the best plan and the need to find a plan quickly. It also chooses a plan based on statistics and not based on actual runtimes (because there is a chicken-and-egg problem: how do you know which plan has the smallest runtime without running it?), so the chosen plan that looks best based on statistics might not actually be best.

The idea I'm proposing circumvents the whole trade-off problem by explicitly choosing to do something that makes the planner run really slowly and take a really long time. But it doesn't do it "at runtime", in the sense that you don't do it for each query. You just do it once up front and be done with it. Of course, this is only useful for people with reasonably static queries and reasonably static table statistics, so that a good plan found up-front continues to be a good plan as it is repeatedly used.

My personal motivation is that I have tables whose statistics are quite static. The data itself changes, but the statistical distribution from which the data is pulled is unchanging, so the table statistics end up about the same even as the data itself is added and deleted. On top of that, the planner keeps choosing the wrong plan, which I know to be true because I can make individual queries run faster by structuring them in ways that the planner can't see through and "optimize" away the particular plan that I am effectively giving it. But this is a PITA for me, especially since I don't always know what the best plan might be and have to try them all until I find the right one. (With the added complexity that I can't always figure out how to trick the planner into choosing a specific plan, and hence can't test it.) It would be *so much easier* to have an option to tell the planner to try them all.

mark

On Tue, Aug 08, 2006 at 08:23:05AM -0700, Mark Dilger wrote:
If this feature I'm proposing already exists, sorry for the waste of bandwidth, and could someone please point me to it? :)

What if there were a mode that told postgres to do an exhaustive search (or if not exhaustive, then much more extensive search) of all plans (or many plans), trying each plan, reporting the performance of each, and discarding the query results, much like "explain analyze" does. Postgres could then dump the best plan in machine readable (and semi-human readable) form which the planner could parse and use at some later date in lieu of a SQL query.

This would allow people with reasonably static table statistics (where the best plan is not likely to change) to spend upfront cycles investigating the best plan and then embed that plan in their business logic. Since the stored plan is both written-by and read-by postgres, it can get quite complicated without putting a burden on humans to read and write such complicated things. It would also remove the risk that the planner will occasionally (due to its nondeterministic workings) choose a really bad plan and stall a production system.

mark

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Reply via email to