[
https://issues.apache.org/jira/browse/CALCITE-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17742592#comment-17742592
]
Yu Tian commented on CALCITE-5842:
----------------------------------
Hi [~julianhyde]
I am not quite getting your point. Do you mean adding rowType related info
inside the hashcode method is an enhancement we can consider/do?
I am not sure the relationships between PairList and rowType(or this use case),
the only thing I can think of, is that rowType contains the fieldList, which
contains the field name -> field data type, which is a perfect fit for PairList
use case. Are you implying that we should do the enhancement, use PairList to
load fieldList from rowType into the hashcode calculation of LogicalProject?
If you are okay with the enhancement and need resources to implement the
changes, I can help with a PR for it, please let me know. Thanks for the quick
reply, appreciate it.
> LogicalProject deepHashCode creates same value with different RowType
> ----------------------------------------------------------------------
>
> Key: CALCITE-5842
> URL: https://issues.apache.org/jira/browse/CALCITE-5842
> Project: Calcite
> Issue Type: Bug
> Affects Versions: 1.32.0
> Reporter: Yu Tian
> Priority: Major
>
> The LogicalProject class has deepEquals0 and deepHashCode0 methods, in the
> deepEquals0 method, it consider getRowType() as one equal standard, however,
> in the deepHashCode0, it is missing the getRowType() to generated the hash
> value. Do we do this on purpose or it is a bug?
> [https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/core/Project.java#L348,L368]
>
>
> The reason we ask is that we are trying 2 use cases from our side.
> The first one is two LogicalTableScan with similar configurations, which are
> connected to 2 separate LogicalFiler, then we LogicalJoin these 2 together.
> One issue we noticed is that, in HepPlanner, it has logics as below
>
> package org.apache.calcite.plan.hep.HepPlanner
> {code:java}
> // try to find equivalent rel only if DAG is allowed
> if (!noDag) {
> // Now, check if an equivalent vertex already exists in graph.
> HepRelVertex equivVertex = mapDigestToVertex.get(rel.getRelDigest());
> if (equivVertex != null) {
> // Use existing vertex.
> return equivVertex;
> }
> } {code}
>
> The 2 logicalProjects from the 2 LogicalTableScans have same hashCode value
> based on the deepHashCode method in LogicalProject, because it didn’t
> consider the getRowType() value, the planner is replacing LogicalTableScan2
> with LogicalTableScan1, in fact, we should treat them as separate items to
> process.
>
> Another use case we have, we have 2 diagrams, each diagram with
> LogicalTableScan, LogicalFiler, LogicalTableModify, LogicalTableScan have
> similar setup with different rowType information. This time, HepPlanner is
> passing, since it has separate HepPlanner stage, so above issue is not
> happening. However, when it reach the VolcanoPlanner, the logics
>
> package org.apache.calcite.plan.volcano.VolcanoPlanner
> {code:java}
> // If it is equivalent to an existing expression, return the set that
> // the equivalent expression belongs to.
> RelDigest digest = rel.getRelDigest();
> RelNode equivExp = mapDigestToRel.get(digest); {code}
>
> The map replace the LogicalTableScan1 with LogicalTableScan2 in the
> LogicalProject stage since they have same hashCode, and the map is reusing
> earlier processed RelNode, which caused the issues.
>
> Here are the proposals we have,
>
> * Narrow Scope change: LogicalProject is the most frequently used project
> type, we only change it.
> ** Modify the LogicalProject method deepHashCode method to use
> {code:java}
> @Override public int deepHashCode() {
> return Objects.hash(traitSet, input.deepHashCode(), exps, hints,
> getRowType());
> }{code}
> Consider the getRowType() value in the hash generation will resolve the
> issue, since the rowType contains the field names and data types information.
>
> * Whole Scope change: Change the deepHashCode method in Project class.
> ** Similar change as above, however, the scope of this change is wide
> compared to the first one.
>
> Is it something we can consider to improve in the following release of Apache
> Calcite?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)