Neil Conway <[EMAIL PROTECTED]> writes: > Tom Lane <[EMAIL PROTECTED]> writes: >> Where are we going to find a representative test set of >> dozen-or-more- way SQL join queries?
> As far as getting good lotsa-join queries, I think we can either: > (1) generate the queries programmatically > For example, star-schema join queries might be tractable via this > method. Sure, we can generate umpteen thousand star joins in no time, but they are all the same problem. I don't think this is particularly helpful either for finding optimizer bugs or for making choices about performance issues. An example of the kind of thing I'm worried about: I realized just yesterday that GEQO is actively broken in 7.4 because it cannot generate "bushy" plans. As of 7.4 there are cases involving IN constructs where the only valid plans are bushy. For example, in the regression database: regression=# set geqo_threshold to 3; SET regression=# explain select * from tenk1 where regression-# unique1 in (select unique2 from tenk1 t2, int4_tbl t3 where hundred = f1) and regression-# unique2 in (select unique1 from tenk1 t4, int4_tbl t5 where hundred = f1); ERROR: failed to make a valid plan You could test star joins all day long and not find that bug. > (2) get the queries manually > This would involve either writing schema and a bunch of queries for > an "example app" (a la the Java Web Store), or getting a sanitized > version of the schema & common queries used by a few large PG > users. The latter might be the better way to go... The only thing I'd really trust is a sampling of complex queries from different real-world applications. This will probably be hard to get, and we can only hope to have dozens of queries not hundreds or thousands. We will also need to think about how we will get the pg_statistic entries to correspond to the real-world situations. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings