[
https://issues.apache.org/jira/browse/CALCITE-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813303#comment-17813303
]
Alessandro Solimando commented on CALCITE-6236:
-----------------------------------------------
Equivalent in terms of cardinality (rows) yes, but cost not so sure, otherwise
there would be no interest at all in CBO, as the way you get to the same result
(the same number of rows, the cardinality) matters in terms of cost.
If we have a situation were "EnumerableBatchNestedLoopJoin" is always preferred
to "JdbcJoin" due to the extra filter the rewrite adds, one could either look
into improve the cardinality estimation for the filter by using data
statistics, or check if there isn't any problems with the cost model we use
(e.g., maybe the added cost of "JdbcToEnumerableConverter" should be higher?).
Do you guys have any running example for the same problem happening with
de-correlation so that we can reason on that too at the same time without
focusing too much on this specific example?
> EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation
> -----------------------------------------------------------------------
>
> Key: CALCITE-6236
> URL: https://issues.apache.org/jira/browse/CALCITE-6236
> Project: Calcite
> Issue Type: Bug
> Reporter: Ulrich Kramer
> Priority: Major
> Labels: pull-request-available
>
> {{EnumerableBatchNestedLoopJoin}} always adds a {{Filter}} on the right
> relation.
> This filter reduces the number of rows by it's selectivity (in our case by a
> factor of 4).
> Therefore, {{RelMdUtil.getJoinRowCount}} returns a value 4 times lower
> compared to the one returned for a {{JdbcJoin}}.
> This leads to the fact that in most cases {{EnumerableBatchNestedLoopJoin}}
> is preferred over {{JdbcJoin}}.
> This is an example for the different costs
> {code}
> EnumerableProject rows=460.0 self_costs=460.0 cumulative_costs=1465.0
> EnumerableBatchNestedLoopJoin rows=460.0 self_costs=687.5
> cumulative_costs=1005.0
> JdbcToEnumerableConverter rows=100.0 self_costs=10.0
> cumulative_costs=190.0
> JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> JdbcToEnumerableConverter rows=25.0 self_costs=2.5 cumulative_costs=127.5
> JdbcFilter rows=25.0 self_costs=25.0 cumulative_costs=125.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> {code}
> vs.
> {code}
> JdbcToEnumerableConverter rows=1585.0 self_costs=158.5 cumulative_costs=2023.5
> JdbcJoin rows=1585.0 self_costs=1585.0 cumulative_costs=1865.0
> JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)