On Tue, Sep 19, 2017 at 3:17 PM, Ashutosh Bapat < ashutosh.ba...@enterprisedb.com> wrote:
> On Tue, Sep 19, 2017 at 2:35 AM, Robert Haas <robertmh...@gmail.com> > wrote: > > On Mon, Sep 18, 2017 at 8:02 AM, Ashutosh Bapat > > <ashutosh.ba...@enterprisedb.com> wrote: > >> partition pruning might need partexprs look up relevant quals, but > >> nullable_partexprs doesn't have any use there. So may be we should add > >> nullable_partexpr to RelOptInfo as part of 0002 (partition-wise join > >> implementation) instead of 0001. What do you think? > > > > +1. > > Done. > > > > >>> - I'm not entirely sure whether maintaining partexprs and > >>> nullable_partexprs is the right design. If I understand correctly, > >>> whether or not a partexpr is nullable is really a per-RTI property, > >>> not a per-expression property. You could consider something like > >>> "Relids nullable_rels". > >> > >> That's true. However in order to decide whether an expression falls on > >> nullable side of a join, we will need to call pull_varnos() on it and > >> check the output against nullable_rels. Separating the expressions > >> themselves avoids that step. > > > > Good point. Also, I'm not sure about cases like this: > > > > SELECT * FROM (SELECT b.x, b.y FROM a LEFT JOIN b ON a.x = b.x WHERE > > a.y = b.y) w LEFT JOIN c ON w.x = c.x AND w.y = c.y; > > > > Suppose the relations are all partitioned by (x, y) but that the = > > operator is not strict. A partition-wise join is valid between a and > > b, but we can't regard w as partitioned any more, because w.x might > > contain nulls in partitions where the partitioning scheme wouldn't > > allow them. On the other hand, if the subquery were to select a.x, > > a.y then clearly it would be fine: there would be no possibility of a > > NULL having been substituted for a proper value. > > > > What if the subquery selected a.x, b.y? Initially, I thought that > > would be OK too, because of the fact that the a.y = b.y clause is in > > the WHERE clause rather than the join condition. But on further > > thought I think that probably doesn't work, because with = being a > > non-strict operator there's no guarantee that it would remove any > > nulls introduced by the left join. Of course, if the subselect had a > > WHERE clause saying that b.x/b.y IS NOT NULL then having the SELECT > > list mention those columns would be fine. > > > > I am actually not sure whether we can use partition-wise join for a > LEFT JOIN b when the partition key equalities are spread across ON and > WHERE clauses. I am not able to find any example against it, but I am > not able to prove it as well. The reference I used for partition-wise > join [1], mentions JOIN conditions i.e. ON clause conditions. But all > the examples used in that paper are that of INNER join. So, I am not > sure what exactly the authors meant by JOIN conditions. Right now I am > restricting the patch to work with only conditions in the ON clause. > > Practically most of the operators are strict. OUTER join's WHERE > clause has any partition key equality with strict operator, optimizer > will turn > that OUTER join into an INNER one, turning all clauses into join > clauses. That will enable partition-wise join. So, the current > restriction doesn't restrict any practical cases. > > OTOH, I have seen that treating ON and WHERE clauses as same for an > OUTER join leads to surprising results. So, I am leaning to treat them > separate for partition-wise join as well and only use ON clause > conditions for partition-wise join. If we get complaints about > partition-wise join not being picked we will fix them after proving > that it's not harmful. Lifting that restriction is not so difficult. > have_partition_key_equijoin() ignores "pushed down" quals. We have to > just change that condition. > > Your last sentence about a clause b.x IS NOT NULL or b.y IS NOT NULL > is interesting. If those conditions are in ON clause, we may still > have a result where b.x and b.y as NULL when no row in "a" matches a > row in "b". If those conditions are in WHERE clause, I think optimizer > will turn the join into an INNER join irrespective of whether the > equality operator is strict. > > > > >> If partition-wise join is disabled, partition-wise aggregates, > >> strength reduction of MergeAppend won't be possible on a join tree, > >> but those will be possible on a base relation. Even if partition-wise > >> join enabled, one may want to disable other partition-wise > >> optimizations individually. So, they are somewhat independent > >> switches. I don't think we should bundle all of those into one. > >> Whatever names we choose for those GUCs, I think they should have same > >> naming convention e.g. "partition_wise_xyz". I am open to suggestions > >> about the names. > > > > I think the chances of you getting multiple GUCs for different > > partition-wise optimizations past Tom are pretty low. > > We do have enable_hashjoin and enable_hashagg to control use of > hashing for aggregate and join. On similar lines we can have three > GUCs to enable use of partition-wise strategy, one for each of join, > aggregation and sorting. Having granular switches would be useful for > debugging and may be to turn partition-wise strategies off when they > are not optimal. I think having a granular control over each of these optimization will be handy for the DBAs too. > Do we want a switch to turn ON/OFF partition pruning? > Said, that I am fine with single GUC controlling all. We won't set any > partitioning information in RelOptInfo if that GUC is turned OFF. > > [1] https://pdfs.semanticscholar.org/27c2/ba75f8b6a39d4bce85d5579dace609 > c9abaa.pdf > -- > Best Wishes, > Ashutosh Bapat > EnterpriseDB Corporation > The Postgres Database Company > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > > -- Jeevan Chalke Principal Software Engineer, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company