Re: [PERFORM] Terrible plan for join to nested union

Nate Allan Sat, 07 Jul 2012 22:50:54 -0700

Thanks for your reply Tom.

>> I have a query which joins to a nested union and I'm getting a plan which 
>> never returns.  Here is the query simplified as much as possible:
>> select 'anything' as result
>>                from "Attribute" as A1
>>                               inner join
>>                               (
>>                                              select R."TargetID" as 
>> "SourceID"
>>                                                             from 
>> "Relationship" as R
>>                                              union
>>                                              select A2."PersonID" as 
>> "SourceID"
>>                                                             from "Attribute" 
>> as A2
>>                               ) as X on (A1."PersonID" = X."SourceID")
>>                where (A1."ID" = 124791200)
>
>  AFAICS, the UNION result must include every possible value of 
> Attribute.PersonID, which means the inner join cannot 
>eliminate any rows of A1 (except those with null PersonID), which seems a tad 
>silly.


It seems to me that the join condition (and hence the restriction) should be 
pushed down into both sides of the union to bring the cardinality limit from 
millions to 1.  I'm imagining a rewrite like this:  
        R(a) J (b U c)  ->  (b J R(a)) U (c J R(a))
...where R = Restrict, J = Join, U = Union

This is the kind of rewrite I would make as a sentient being and it's one that 
at least one other DBMS I know of makes.

As an aside, even though not as good as pushing down the restriction, the plan 
that the "union all" produces is decent performance-wise:
http://explain.depesz.com/s/OZq
It seems to me that a similar alternative could be applied for a distinct union 
by using two Index Scans followed by a Merge Join.

>What exactly are you trying to accomplish here?

I state in my post that there are several ways to rewrite the query to 
work-around the issue; I'm not really asking for a work-around but a) wondering 
why the plan is so bad; and b) asking if it could be fixed if possible.  
Unfortunately rewriting the query isn't a trivial matter in our case because 
the X (union) part of the query is represented logically as a view, which is 
expected to be restricted and/or joined so as not to actually materialize the 
actual union.  Unfortunately the PostgreSQL planner seems to want to actually 
materialize that view.  Working around this would basically entail not using 
the view, which is used all over the place, and instead duplicating the view's 
logic except pushing the restrictions and/or joins down into both sides of the 
union in each case.  I could do that, but doing so would be: a) against the 
spirit of the Relational Model; b) against the spirit of "fix the planner 
rather than add optimizer hints"; c) a royal pain because it causes a rewrite 
of application logic; d) a point for at least one other DBMS's optimizer.  :-)

>Anyway, I wonder whether you'd get better results with an EXISTS over a 
>correlated UNION ALL subquery, ie, something like
> ...

Thanks for the work-arounds, but again, that's not quite what I'm after.

Best,

-Nate



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Terrible plan for join to nested union

Reply via email to