It turned out to be more complicated than I thought. The fix of EnumerableMergeJoin uncovered a well-known infinite planning time issue https://issues.apache.org/jira/browse/CALCITE-2223
The thing is previously the rule did not even try to sort its inputs, thus it was producing value only for the case when the inputs **happen to be** sorted. I thought it would make sense to consider plans like MergeJoin(Sort(..), Sort(..)) as well, so I altered the rule so it creates Sort nodes in case the input is not sorted. So far so good, however, a trivial case like JdbcTest#testJoinManyWay degrades significantly because now it has much more nodes to explore. In current master, testJoinManyWay completes in 7 seconds (see https://travis-ci.org/apache/calcite/jobs/630585211#L1624 ) However, the test does not complete if I activate MergeJoin rule. I did try to fix that, and I have a couple of commits to make it a bit more stable (see commits in https://github.com/apache/calcite/pull/1702) Basically 1) I ensure RexNodes are canonicalized at construction time (e.g. "=($1, $0)" is always created as "=($0, $1)") ^^ This seems to produce lots of test failures, however, it would make test output more stable which is good. 2) I add a couple of call.rel(0).getConvention() == call.rel(1).getConvention(); checks to rules like FilterProjectTransposeRule. The net result is testJoinManyWay takes 140 seconds on my machine. It is not perfect, but at least it manages to complete for 6 joins. An alternative option is to have **two** (or three) flavours of MergeJoinRule: A) A rule that just reuses sorted inputs if there are any (less sort nodes => less planning space => faster planning) B) A rule that tries adding Sort nodes (more sort nodes => longer planning, but it might happen to produce creative plans) C) A rule that tries adding Sort node only in case one of its inputs is already sorted appropriately (a mix between A and B) Any thoughts? Vladimir
