Bad email client! bad!. (I sent this half-edited before. Sorry!) Jeroen or Stuart are probably the best folk to ask about this topic :) However on performance Tuesday this week I ran into a really clear example...
The subject refers to how to get fast queries out of postgres: ask queries that consult a lot of data from one table, or consult little data from many tables, but never a lot of data from many tables. In bug 787294 I ran into something which may be a case of this. I also had a full-day headache which led to some interesting analysis mistakes ;). For instance, this humdinger candidate query plan: 16:47 #launchpad-dev: < lifeless> Nested Loop (cost=2.76..17359385328490.43 rows=247357031677542 width=4) Anyhow, in this bug we're querying for all the bugs that the team is related to (assigned, subscribed, created, commented or structurally subscribed) to which have patches. The count of patches needed is 102. The current live plan has an estimated cost of 800K: Aggregate (cost=816866.73..816866.74 rows=1 width=0) -> Nested Loop (cost=705926.62..816816.29 rows=20177 width=0) -> HashAggregate (cost=705926.62..706128.39 rows=20177 width=280) -> Append (cost=0.00..704564.67 rows=20177 width=280) with the bulk of the cost coming in from the structural subscription subplan: -> Nested Loop (cost=43392.43..702903.59 rows=20168 width=280) Join Filter: ((public.bugtask.product = structuralsubscription.product) OR (public.bugtask.productseries = structuralsubscription.productseries) OR ((public.product.project = structuralsubscription.project) AND (public.bugtask.product = public.product.id)) OR ((public.bugtask.distribution = structuralsubscription.distribution) AND ((public.bugtask.sourcepackagename = structuralsubscription.sourcepackagename) OR (structuralsubscription.sourcepackagename IS NULL))) OR (public.bugtask.distroseries = structuralsubscription.distroseries) OR (public.bugtask.milestone = structuralsubscription.milestone)) ... I ran the original query on qastaging, and stopped it after 45 minutes. Now, qastaging is missing an index that production and staging has, so I moved over to staging to test. I successively refined the queries I ran to exercise smaller and smaller variables, trying to find what triggered the bad plan. (45 minutes is -much- longer than just doing a raw read of all the data could explain). This query:SELECT distinct BugTask.bug FROM BugTask inner join bug on bug.id=bugtask.bug LEFT JOIN Product ON BugTask.product = Product.id AND Product.active inner join structuralsubscription ss on ( ss.subscriber = 343381 and( BugTask.product = ss.product OR BugTask.productseries = ss.productseries OR Product.project = ss.project OR BugTask.distribution = ss.distribution AND (BugTask.sourcepackagename = ss.sourcepackagename OR ss.sourcepackagename IS NULL) OR BugTask.distroseries = ss.distroseries OR BugTask.milestone = ss.milestone) ) WHERE ((BugTask.status = 10) OR (BugTask.status = 15) OR (BugTask.status = 20) OR (BugTask.status = 21) OR (BugTask.status = 22) OR (BugTask.status = 25)) AND Bug.duplicateof IS NULL AND Bug.latest_patch_uploaded IS NOT NULL AND Bug.private = FALSE; which checks all the structural subscriptions has an estimated cost of 110K: HashAggregate (cost=112166.83..112397.51 rows=23068 width=4) and runs in 88000 ms. This query: SELECT distinct BugTask.bug FROM BugTask inner join bug on bug.id=bugtask.bug LEFT JOIN Product ON BugTask.product = Product.id AND Product.active inner join structuralsubscription ss on ( ss.subscriber = 343381 and( BugTask.distribution = ss.distribution AND (BugTask.sourcepackagename = ss.sourcepackagename OR ss.sourcepackagename IS NULL) ) ) WHERE ((BugTask.status = 10) OR (BugTask.status = 15) OR (BugTask.status = 20) OR (BugTask.status = 21) OR (BugTask.status = 22) OR (BugTask.status = 25)) AND Bug.duplicateof IS NULL AND Bug.latest_patch_uploaded IS NOT NULL AND Bug.private = FALSE; has an estimated cost of 130K: HashAggregate (cost=131351.38..133944.19 rows=259281 width=4) and runs in 900ms. The only difference is checking additional fields in the structural subscription. Now, the full check is triggering a pathological behaviour in postgresql - it may even be a bug. The differing join filter (perhaps due to a statistics triggered thing) changes the plan from walking the subscribed bugtargets (slow case) to walking bugs with patches and filtering down to subscriptions (faster case). AIUI this is what happens with deep-wide queries: the planner eventually decides we'll see all rows and starts walking from the smallest table, which is utterly wrong when we'll reenter large related tables: and the server team has 150 structural subscriptions. So the question became, how to express the query such that postgresql won't be so confused. This query (https://bugs.launchpad.net/launchpad/+bug/787294/comments/11) does that: with ss as (select * from structuralsubscription where subscriber=343381) SELECT distinct BugTask.bug FROM BugTask inner join bug on bug.id=bugtask.bug LEFT JOIN Product ON BugTask.product = Product.id AND Product.active left join ss ss1 on BugTask.product = ss1.product left join ss ss2 on BugTask.productseries = ss2.productseries left join ss ss3 on Product.project = ss3.project left join ss ss4 on BugTask.distribution = ss4.distribution AND (BugTask.sourcepackagename = ss4.sourcepackagename OR ss4.sourcepackagename IS NULL) left join ss ss5 on BugTask.distroseries = ss5.distroseries left join ss ss6 on BugTask.milestone = ss6.milestone WHERE ((BugTask.status = 10) OR (BugTask.status = 15) OR (BugTask.status = 20) OR (BugTask.status = 21) OR (BugTask.status = 22) OR (BugTask.status = 25)) AND Bug.duplicateof IS NULL AND Bug.latest_patch_uploaded IS NOT NULL AND Bug.private = FALSE AND (Bugtask.product IS NULL OR Product.active = TRUE) AND null_count(ARRAY[ss1.id, ss2.id, ss3.id, ss4.id, ss5.id, ss6.id]) < 6; This has a nearly identical cost: HashAggregate (cost=112243.53..112313.50 rows=6997 width=4) But runs quickly enough: Time: 1130.509 ms The change to a number of shallow joins avoided the confusion and the result runs much faster... for this case. Moral of the story: sql is hard :) -Rob _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : launchpad-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp