Re: Cluster mismatch between RelNodes of a query and a materialized view

Remus Rusanu Thu, 09 Mar 2017 04:40:29 -0800

I think I have a solution that works, is not intrusive on base RelNode, and can 
be extended by Calcite consumers using custom rel nodes.
A new RelShuttle is added to calcite, calcite.tools.RelCopier. This implements 
the LogicalXxx copy operators. Hive extends this into HiveRelCopier which 
implements the HiveXxx copy operators. This pattern already exists in Hive with 
HiveRelShuttle. I have updated the https://github.com/apache/calcite/pull/392 
with the skeleton of the Calcite side changes, and is entirely contained in a 
shuttle, no RelNode changes required. I will continue working on this to extend 
support for all LogicalXxx operators, to add tests etc.


Thanks,
~Remus


On 3/8/17, 11:40 AM, "Julian Hyde" <[email protected]> wrote:

    > On Mar 8, 2017, at 11:27 AM, Ashutosh Chauhan <[email protected]> 
wrote:
    > 
    > Fact that
    > RelFactories exist is evidence enough that LogicalRelNodes are not
    > sufficient outside of Calcite.
    
    Not true.
    
    Hive uses its own RelNodes far more than other projects do. The other 
projects do a significant part of their planning in the logical domain, and 
only later move to the physical domain. They use RelFactories so that they can 
re-use rules when they are in the physical domain.
    
    Materialized view substitution is a logical operation and it makes sense to 
do it in the logical domain.
    
    I don’t recall the details of the Hive RelNodes but isn’t it true that in 
most cases they have the same information as the corresponding Logical RelNode?
    
    We can find a solution to this problem for Hive RelNodes, but I still feel 
that Hive overuses Hive RelNodes and that causes inefficiencies for both Hive 
and Calcite.
    
    Julian
    
    
    
    
    > On Mar 8, 2017, at 11:27 AM, Ashutosh Chauhan <[email protected]> 
wrote:
    > 
    > Working on LogicalRelNodes doesnt solve problem for Hive (or any other
    > project which use RelFactories to generate custom rel node). Fact that
    > RelFactories exist is evidence enough that LogicalRelNodes are not
    > sufficient outside of Calcite. We have to solve this problem in that
    > context.
    > 
    > On Wed, Mar 8, 2017 at 10:50 AM, Remus Rusanu <[email protected]>
    > wrote:
    > 
    >> To make sure I understand, you’re saying that the cloning would take a
    >> tree of various Xxx operators (eg. HiveProject) and generate a tree of
    >> equivalent LogicalXxx  operators (HiveProject -> LogicalProject,
    >> HiveFilter->LogicalFilter, HiveAggregate ->  LogicalAggregate and so on).
    >> Is my understanding correct? Ignoring Hive issues, wouldn’t that loose
    >> information?
    >> For Hive specific case, the HiveXxx operator extend Xxx not LogicalXxx
    >> (not sure yet if this is an issue or not) but for sure they carry lots of
    >> extra info that would be lost if the clone would contain LogicalXxx 
instead
    >> of HiveXxx.
    >> 
    >> Thanks,
    >> ~Remus
    >> 
    >> On 3/8/17, 9:45 AM, "Julian Hyde" <[email protected]> wrote:
    >> 
    >>    Could you work on LogicalXxx rather than HiveXxx? I know the Hive team
    >> likes to do everything in terms of HiveXxx RelNodes but I’m not sure it 
has
    >> to be that way.
    >> 
    >>    Julian
    >> 
    >>> On Mar 8, 2017, at 9:37 AM, Remus Rusanu <[email protected]>
    >> wrote:
    >>> 
    >>> Agree on the RelOptCluster.
    >>> 
    >>> I noticed how adding new method to RelNode yields a gargantuan task
    >> (from the list of compile errors I got…). But I’m not sure a RelShuttle 
can
    >> handle this. For my test case the 3 nodes that need to be cloned are
    >> HiveScanTable, HiveFilter and HiveProject, all declared in Hive and not
    >> even extending AbstractRelNode but directly base RelNode. A RelShuttle
    >> wouldn’t know about these types, and wouldn’t be able to create them 
(short
    >> of using reflection and adhering to a strict constructor signature, 
which I
    >> think is much too fragile). What I did to get the ball rolling I added a
    >> default implementation in AbstractRelNode (basically assert ‘must be
    >> implemented by subclass’), this allowed me to test easily and, as a proof
    >> of concept, I have it working. But I reckon is fragile test wise, and new
    >> RelNode types wouldn’t know about the requirement to provide a copyTo
    >> implementation.
    >>> 
    >>> Thanks,
    >>> ~Remus
    >>> 
    >>> On 3/8/17, 9:05 AM, "Julian Hyde" <[email protected]> wrote:
    >>> 
    >>>   The argument should be a RelOptCluster, not a RelOptPlanner. The
    >> link from RelNode to planner is indirect currently (via cluster) and will
    >> be non-existent after CALCITE-1536.
    >>> 
    >>>   I question whether we need a new method. Putting an abstract
    >> method on RelNode is a huge burden because every RelNode sub-class needs 
to
    >> be fixed when people upgrade. Even a non-abstract method imposes a
    >> conceptual burden: more methods to understand.
    >>> 
    >>>   So, my approach would be to sub-class RelShuttle. It’s sufficient
    >> that it only works for LogicalXxx nodes.
    >>> 
    >>>   No need to copy RexNode expressions. They are immutable.
    >>> 
    >>>   Julian
    >>> 
    >>> 
    >>>> On Mar 8, 2017, at 4:14 AM, Remus Rusanu <[email protected]>
    >> wrote:
    >>>> 
    >>>> I created CALCITE-1681 https://issues.apache.org/
    >> jira/browse/CALCITE-1681 and I intent to work on it for finishing
    >> HIVE-15708.
    >>>> My current thinking is to create a RelCopier based on RelShuttle
    >> and add a new abstract RelNode.copyTo(RelOptPlanner) that each concrete 
Rel
    >> type must override. The Rex part is already handled by the existing
    >> RexCopier.
    >>>> 
    >>>> Thanks,
    >>>> ~Remus
    >>>> 
    >>>> On 3/6/17, 12:30 PM, "Julian Hyde" <[email protected]> wrote:
    >>>> 
    >>>>  Every RelNode belongs to a RelOptCluster, and basically there is
    >> one RelOptCluster created each time a query is prepared. When working 
with
    >> materialized views, the view’s query is represented as a tree of 
RelNodes,
    >> that tree is used for optimizing more than one query. When planning a
    >> particular query, the nodes of that query will have a different
    >> RelOptCluster than the nodes of the materialized view(s) they are matched
    >> against.
    >>>> 
    >>>>  How do we deal with this? Do we copy the nodes into the query’s
    >> cluster once we have found a match? If so, how? I couldn’t find a 
sub-class
    >> of RelVisitor or RelShuttle that copies trees to a different 
RelOptCluster.
    >>>> 
    >>>>  By the way, https://issues.apache.org/jira/browse/CALCITE-1536 <
    >> https://issues.apache.org/jira/browse/CALCITE-1536> aims to improve the
    >> RelNode life-cycle but I don’t think it will solve this problem.
    >>>> 
    >>>>  Julian
    >>>> 
    >>>> 
    >>> 
    >>> 
    >>> 
    >>> 
    >> 
    >> 
    >> 
    >> 
    >>

Re: Cluster mismatch between RelNodes of a query and a materialized view

Reply via email to