On Fri, Feb 27, 2009 at 11:06 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > One other thought to roll around in your head: at the time that the > current add_path logic was designed, compare_pathkeys was ungodly > expensive, which is why the code tries to compare costs first. > We've since introduced the "canonical pathkey" representation, which > makes compare_pathkeys enough cheaper that it might not be insane > to do it in the other order. I don't immediately see an easy win > there, because as things are structured we typically want the cost > comparison result anyway for list sorting purposes. But maybe there's > a way around that.
I think the best think to do with the pathkeys comparison is avoid it altogether as frequently as possible. One thing I've been thinking about is trying to refactor add_paths_to_joinrel() so that it doesn't need to be called twice for every pair of input relations. This saves some cycles in a few different places: you avoid recomputing the merge clauses and hash clauses twice, and because you now generate all of the unsorted hash-paths in the same piece of code, you can do an add_similar_paths()-type thing where you only compare costs first and then throw just the winners into the main list. I think there are some minor economies in sort_inner_and_outer() as well. I haven't been able to convince myself that all of this adds up to a measurable win, though. The problem is sort_inner_and_outer and hash_inner_and_outer actually generate only a relatively small number of paths. The latter generates at most two, and the former generates length(mergeclause_list), which at least for the kinds of queries that I usually write is most often one. Maybe two? On the other hand, match_unsorted_outer generates up to 5 nestloops plus potentially several merge joins PER OUTER PATH. That's a much bigger number. Ergo, if you want to make join planning fast, you need to make match_unsorted_outer() consider fewer paths or consider them more quickly. And, unfortunately, it's not clear how to do that. I have a nagging feeling that the logic around when to try materializing the inner path could be improved. We skip it when the cheapest total inner path is an index scan, bitmap heap scan, tid scan, material path, function scan, cte scan, or work table scan (which is an irritatingly long list... is there a better way to do this?), but cost_nestloop() only costs the thing differently if the inner side is a material path or hash path. I sort of wonder if cost_nestloop() could be made to decide when materialization of the inner path is warranted. That might help some, but I still think the fact that match_unsorted_outer() generates a number of paths that is linear in the size of the outer pathlist (rather than a constant) is going to tend to make it the most expensive part of join planning. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers