[
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840251#comment-16840251
]
Lai Zhou commented on CALCITE-2973:
-----------------------------------
[~rubenql] , good analysis. I tested this solution, but there're still some
failed tests, the report:
{code:java}
[ERROR] Tests run: 5018, Failures: 47, Errors: 7, Skipped: 115
[ERROR] Errors: [ERROR] LatticeSuggesterTest.testEmpDept:76 » IndexOutOfBounds
index (8) must be less ... [ERROR]
LatticeSuggesterTest.testExpressionInAggregate:272 » IndexOutOfBounds index
(3... [ERROR] LatticeSuggesterTest.testFoodMartAll:389->checkFoodMartAll:301 »
IndexOutOfBounds [ERROR]
LatticeSuggesterTest.testFoodMartAllEvolve:393->checkFoodMartAll:301 »
IndexOutOfBounds [ERROR] LatticeSuggesterTest.testFoodmart:153 »
IndexOutOfBounds index (17) must be le... [ERROR]
LatticeSuggesterTest.testSharedSnowflake:264 » IndexOutOfBounds index (31)
mus... [ERROR]
MaterializationTest.testJoinMaterialization9:1825->checkMaterialize:202->checkMaterialize:210
» SQL
{code}
Check the LatticeSuggesterTest.testSharedSnowflake , I found the
!join.analyzeCondition().isEqui(),
did harm to this query.
If I keep the line as
{code:java}
!(join instanceof EquiJoin)
{code}
Almost All the reported failed tests will be success, except the
MaterializationTest.testJoinMaterialization9. You can change this line to find
more details.I think this modification is not safe.
> Allow theta joins that have equi conditions to be executed using a hash join
> algorithm
> --------------------------------------------------------------------------------------
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Affects Versions: 1.19.0
> Reporter: Lai Zhou
> Priority: Minor
> Labels: pull-request-available
> Fix For: 1.20.0
>
> Time Spent: 3h 50m
> Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query for a large dataset (such as 10000*10000),
> the nested-loop join process will take dozens of time than the sort-merge
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will
> improve the performance greatly.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)