Re: [PERFORM] Terrible plan for join to nested union

Pavel Stehule Sat, 07 Jul 2012 23:05:09 -0700

2012/7/8 Nate Allan <nal...@ancestry.com>:
> Thanks for your reply Tom.
>
>>> I have a query which joins to a nested union and I'm getting a plan which 
>>> never returns.  Here is the query simplified as much as possible:
>>> select 'anything' as result
>>>                from "Attribute" as A1
>>>                               inner join
>>>                               (
>>>                                              select R."TargetID" as 
>>> "SourceID"
>>>                                                             from 
>>> "Relationship" as R
>>>                                              union
>>>                                              select A2."PersonID" as 
>>> "SourceID"
>>>                                                             from 
>>> "Attribute" as A2
>>>                               ) as X on (A1."PersonID" = X."SourceID")
>>>                where (A1."ID" = 124791200)
>>
>>  AFAICS, the UNION result must include every possible value of 
>> Attribute.PersonID, which means the inner join cannot
>>eliminate any rows of A1 (except those with null PersonID), which seems a tad 
>>silly.
>
> It seems to me that the join condition (and hence the restriction) should be 
> pushed down into both sides of the union to bring the cardinality limit from 
> millions to 1.  I'm imagining a rewrite like this:
>         R(a) J (b U c)  ->  (b J R(a)) U (c J R(a))
> ...where R = Restrict, J = Join, U = Union
>
> This is the kind of rewrite I would make as a sentient being and it's one 
> that at least one other DBMS I know of makes.
>
> As an aside, even though not as good as pushing down the restriction, the 
> plan that the "union all" produces is decent performance-wise:
> http://explain.depesz.com/s/OZq
> It seems to me that a similar alternative could be applied for a distinct 
> union by using two Index Scans followed by a Merge Join.
>
>>What exactly are you trying to accomplish here?
>
> I state in my post that there are several ways to rewrite the query to 
> work-around the issue; I'm not really asking for a work-around but a) 
> wondering why the plan is so bad; and b) asking if it could be fixed if 
> possible.  Unfortunately rewriting the query isn't a trivial matter in our 
> case because the X (union) part of the query is represented logically as a 
> view, which is expected to be restricted and/or joined so as not to actually 
> materialize the actual union.  Unfortunately the PostgreSQL planner seems to 
> want to actually materialize that view.  Working around this would basically 
> entail not using the view, which is used all over the place, and instead 
> duplicating the view's logic except pushing the restrictions and/or joins 
> down into both sides of the union in each case.  I could do that, but doing 
> so would be: a) against the spirit of the Relational Model; b) against the 
> spirit of "fix the planner rather than add optimizer hints"; c) a royal pain 
> because it causes a rewrite of application logic; d) a point for at least one 
> other DBMS's optimizer.  :-)


you are using EAV schema - it is against to relation model enough :)

this schema has the most terrible performance for large datasets -
looks on hstore instead

Regards

Pavel

>
>>Anyway, I wonder whether you'd get better results with an EXISTS over a 
>>correlated UNION ALL subquery, ie, something like
>> ...
>
> Thanks for the work-arounds, but again, that's not quite what I'm after.
>
> Best,
>
> -Nate
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Terrible plan for join to nested union

Reply via email to