[
https://issues.apache.org/jira/browse/CALCITE-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813276#comment-17813276
]
Ruben Q L commented on CALCITE-6236:
------------------------------------
{quote}Rules create semantically equivalent plans. Someone could argue that
equivalent means that they should have the same number of rows/costs.
{quote}
I'd argue that they would have the same rowCount, but can have different costs
(e.g. a MergeJoin can have higher cost than its equivalent HashJoin, since the
former requires the inputs to be sorted).
Circling back to the "correction factor" approach for EBNLJ, what if:
- When creating the EBNLJ, we store the selectivity of the correlate filter
upon the (original) RHS.
- We know that, from that point on, the EBNLJ's new RHS will have its rowCount
reduced due to the correlate filter that has been applied.
- For the rowCount estimation of the EBNLJ, we can get back the original
rowCount of the RHS by doing something like:
{code:java}
adjusted_rowCount_RHS = rowCount_RHS / selectivity_of_correlate_filter
{code}
And we use that adjustedRowCount in the computation of EBNLJ's rowCount?
> EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation
> -----------------------------------------------------------------------
>
> Key: CALCITE-6236
> URL: https://issues.apache.org/jira/browse/CALCITE-6236
> Project: Calcite
> Issue Type: Bug
> Reporter: Ulrich Kramer
> Priority: Major
> Labels: pull-request-available
>
> {{EnumerableBatchNestedLoopJoin}} always adds a {{Filter}} on the right
> relation.
> This filter reduces the number of rows by it's selectivity (in our case by a
> factor of 4).
> Therefore, {{RelMdUtil.getJoinRowCount}} returns a value 4 times lower
> compared to the one returned for a {{JdbcJoin}}.
> This leads to the fact that in most cases {{EnumerableBatchNestedLoopJoin}}
> is preferred over {{JdbcJoin}}.
> This is an example for the different costs
> {code}
> EnumerableProject rows=460.0 self_costs=460.0 cumulative_costs=1465.0
> EnumerableBatchNestedLoopJoin rows=460.0 self_costs=687.5
> cumulative_costs=1005.0
> JdbcToEnumerableConverter rows=100.0 self_costs=10.0
> cumulative_costs=190.0
> JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> JdbcToEnumerableConverter rows=25.0 self_costs=2.5 cumulative_costs=127.5
> JdbcFilter rows=25.0 self_costs=25.0 cumulative_costs=125.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> {code}
> vs.
> {code}
> JdbcToEnumerableConverter rows=1585.0 self_costs=158.5 cumulative_costs=2023.5
> JdbcJoin rows=1585.0 self_costs=1585.0 cumulative_costs=1865.0
> JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)