On Wed, Aug 23, 2006 at 08:42:10AM -0700, Mark Dilger wrote: > Jim C. Nasby wrote: > >On Tue, Aug 22, 2006 at 11:56:17AM -0700, Mark Dilger wrote: > >>I proposed something like this quite a bit up-thread. I was hoping we > >>could have a mode in which the system would run the second, third, > >>fourth, ... best plans rather than just the best looking one, and then > >>determine from actual runtime statistics which was best. (The proposal > >>also included the ability to output the best plan and read that in at a > >>later time in lieu of a SQL query, but that part of it can be ignored if > >>you like.) The posting didn't generate much response, so I'm not sure > >>what people thought of it. The only major problem I see is getting the > >>planner to keep track of alternate plans. I don't know the internals of > >>it very well, but I think the genetic query optimizer doesn't have a > >>concept of "runner-up #1", "runner-up #2", etc., which it would need to > >>have. > > > >I think the biggest issue is that you'd have to account for varying load > >on the box. If we assume that the database is the only thing running on > >the box, we might be able to do that by looking at things like how much > >IO traffic we generated (though of course OS caching will screw with > >that). > > > >Actually, that's another issue... any plans run after the first one will > >show up as being artificially fast, since there will be a lot of extra > >cached data. > > Yes, caching issues prevent you from using wall-clock time. We could > instrument the code to count the number of rows vs. the number predicted > for each internal join, from which new cost estimates could be generated. But if you're only looking at the number of rows, I suspect there's no need to actually run the other plans; you can just look at how many rows you got in the plan you used. Worst-case, you may have to figure out the correlation stats for the result-set, which could probably be done on the fly without too much impact.
> Perhaps you can check my reasoning for me: I'm imagining a query which > computes AxBxCxD, where A, B, C, and D are actual tables. I'm also > imagining that the planner always chooses AxB first, then joins on C, then > joins on D. (It does so because the single-table statistics suggest this > as the best course of action.) It might be that AxD is a really small > metatable, much smaller than would be estimated from the statistics for A > independent of the statistics for D, but AxB is pretty much what you would > expect given the independent statistics for A and B. So we need some way > for the system to stumble upon that fact. If we only ever calculate > cross-join statistics for plans that the system chooses, we will only > discover that AxB is about the size we expected it to be. So, if the > actual size of AxB is nearly equal to the estimated size of AxB, the system > will continue to choose the same plan in future queries, totally ignorant > of the advantages of doing AxD first. Is there actually evidence that there's a lot of problems with bad join orders? ISTM that's one of the areas where the planner actually does a pretty good job. > That last paragraph is my reasoning for suggesting that the system have a > mode in which it runs the "runner-up #1", "runner-up #2", etc sorts of > plans. Such a mode could force it down alternate paths where it might pick > up interesting statistics that it wouldn't find otherwise. > > This idea could be changed somewhat. Rather than running the other plans, > we could just extract from them which alternate joins they include, and > consider also calculating those join statistics. > > mark > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings