On Tue, Sep 19, 2017 at 3:17 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> On Tue, Sep 19, 2017 at 2:35 AM, Robert Haas <robertmh...@gmail.com>
> wrote:
> > On Mon, Sep 18, 2017 at 8:02 AM, Ashutosh Bapat
> > <ashutosh.ba...@enterprisedb.com> wrote:
> >> partition pruning might need partexprs look up relevant quals, but
> >> nullable_partexprs doesn't have any use there. So may be we should add
> >> nullable_partexpr to RelOptInfo as part of 0002 (partition-wise join
> >> implementation) instead of 0001. What do you think?
> >
> > +1.
>
> Done.
>
> >
> >>> - I'm not entirely sure whether maintaining partexprs and
> >>> nullable_partexprs is the right design.  If I understand correctly,
> >>> whether or not a partexpr is nullable is really a per-RTI property,
> >>> not a per-expression property.  You could consider something like
> >>> "Relids nullable_rels".
> >>
> >> That's true. However in order to decide whether an expression falls on
> >> nullable side of a join, we will need to call pull_varnos() on it and
> >> check the output against nullable_rels. Separating the expressions
> >> themselves avoids that step.
> >
> > Good point.  Also, I'm not sure about cases like this:
> >
> > SELECT * FROM (SELECT b.x, b.y FROM a LEFT JOIN b ON a.x = b.x WHERE
> > a.y = b.y) w LEFT JOIN c ON w.x = c.x AND w.y = c.y;
> >
> > Suppose the relations are all partitioned by (x, y) but that the =
> > operator is not strict.  A partition-wise join is valid between a and
> > b, but we can't regard w as partitioned any more, because w.x might
> > contain nulls in partitions where the partitioning scheme wouldn't
> > allow them.  On the other hand, if the subquery were to select a.x,
> > a.y then clearly it would be fine: there would be no possibility of a
> > NULL having been substituted for a proper value.
> >
> > What if the subquery selected a.x, b.y?  Initially, I thought that
> > would be OK too, because of the fact that the a.y = b.y clause is in
> > the WHERE clause rather than the join condition.  But on further
> > thought I think that probably doesn't work, because with = being a
> > non-strict operator there's no guarantee that it would remove any
> > nulls introduced by the left join.  Of course, if the subselect had a
> > WHERE clause saying that b.x/b.y IS NOT NULL then having the SELECT
> > list mention those columns would be fine.
> >
>
> I am actually not sure whether we can use partition-wise join for a
> LEFT JOIN b when the partition key equalities are spread across ON and
> WHERE clauses. I am not able to find any example against it, but I am
> not able to prove it as well. The reference I used for partition-wise
> join [1], mentions JOIN conditions i.e. ON clause conditions. But all
> the examples used in that paper are that of INNER join. So, I am not
> sure what exactly the authors meant by JOIN conditions. Right now I am
> restricting the patch to work with only conditions in the ON clause.
>
> Practically most of the operators are strict. OUTER join's WHERE
> clause has any partition key equality with strict operator, optimizer
> will turn
> that OUTER join into an INNER one, turning all clauses into join
> clauses. That will enable partition-wise join. So, the current
> restriction doesn't restrict any practical cases.
>
> OTOH, I have seen that treating ON and WHERE clauses as same for an
> OUTER join leads to surprising results. So, I am leaning to treat them
> separate for partition-wise join as well and only use ON clause
> conditions for partition-wise join. If we get complaints about
> partition-wise join not being picked we will fix them after proving
> that it's not harmful. Lifting that restriction is not so difficult.
> have_partition_key_equijoin() ignores "pushed down" quals. We have to
> just change that condition.
>
> Your last sentence about a clause b.x IS NOT NULL or b.y IS NOT NULL
> is interesting. If those conditions are in ON clause, we may still
> have a result where b.x and b.y as NULL when no row in "a" matches a
> row in "b". If those conditions are in WHERE clause, I think optimizer
> will turn the join into an INNER join irrespective of whether the
> equality operator is strict.
>
> >
> >> If partition-wise join is disabled, partition-wise aggregates,
> >> strength reduction of MergeAppend won't be possible on a join tree,
> >> but those will be possible on a base relation. Even if partition-wise
> >> join enabled, one may want to disable other partition-wise
> >> optimizations individually. So, they are somewhat independent
> >> switches. I don't think we should bundle all of those into one.
> >> Whatever names we choose for those GUCs, I think they should have same
> >> naming convention e.g. "partition_wise_xyz". I am open to suggestions
> >> about the names.
> >
> > I think the chances of you getting multiple GUCs for different
> > partition-wise optimizations past Tom are pretty low.
>
> We do have enable_hashjoin and enable_hashagg to control use of
> hashing for aggregate and join. On similar lines we can have three
> GUCs to enable use of partition-wise strategy, one for each of join,
> aggregation and sorting. Having granular switches would be useful for
> debugging and may be to turn partition-wise strategies off when they
> are not optimal.


I think having a granular control over each of these optimization will be
handy for the DBAs too.


> Do we want a switch to turn ON/OFF partition pruning?
> Said, that I am fine with single GUC controlling all. We won't set any
> partitioning information in RelOptInfo if that GUC is turned OFF.
>
> [1] https://pdfs.semanticscholar.org/27c2/ba75f8b6a39d4bce85d5579dace609
> c9abaa.pdf
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


-- 
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Reply via email to