[
https://issues.apache.org/jira/browse/CALCITE-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alessandro Solimando updated CALCITE-7365:
------------------------------------------
Description:
The _SingleRel_ handler in _RelMdRowCount_
([here|https://github.com/apache/calcite/blob/506950a3ebd4807b36901bededc64f7c60497712/core/src/main/java/org/apache/calcite/rel/metadata/RelMdRowCount.java#L194-L197])
always returns the input's row count, ignoring any _estimateRowCount()_
override in subclasses:
{code:java}
public @Nullable Double getRowCount(SingleRel rel, RelMetadataQuery mq) {
return mq.getRowCount(rel.getInput());
}
{code}
This makes it impossible for custom _SingleRel_ operators to provide accurate
row count estimates without implementing a custom metadata handler.
The _RelNode_ catch-all handler
([here|https://github.com/apache/calcite/blob/506950a3ebd4807b36901bededc64f7c60497712/core/src/main/java/org/apache/calcite/rel/metadata/RelMdRowCount.java#L64-L72])
correctly delegates to {_}estimateRowCount(){_}:
{code:java}
public @Nullable Double getRowCount(RelNode rel, RelMetadataQuery mq) {
return rel.estimateRowCount(mq);
}
{code}
The _SingleRel_ handler should do the same for consistency.
Reproducer (for RelMetadataTest test file):
{code:java}
private static class ExpandingRel extends SingleRel {
private static final double EXPANSION_FACTOR = 10.0;
ExpandingRel(RelOptCluster cluster, RelTraitSet traits, RelNode input) {
super(cluster, traits, input);
}
@Override public double estimateRowCount(RelMetadataQuery mq) {
return mq.getRowCount(input) * EXPANSION_FACTOR;
}
}
@Test void testRowCountCustomSingleRel() {
final RelNode scan = sql("select * from emp").toRel();
final ExpandingRel expanding =
new ExpandingRel(scan.getCluster(), scan.getTraitSet(), scan);
final RelMetadataQuery mq = scan.getCluster().getMetadataQuery();
final Double rowCount = mq.getRowCount(expanding);
// Returns 14.0 (input row count) instead of 140.0 (input * 10)
assertThat(rowCount, is(EMP_SIZE * 10));
}
{code}
Fix:
Change the _SingleRel_ handler to delegate to {_}estimateRowCount(){_}:
{code:java}
public @Nullable Double getRowCount(SingleRel rel, RelMetadataQuery mq) {
return rel.estimateRowCount(mq);
}{code}
This is backward compatible since _SingleRel.estimateRowCount()_ already
returns _mq.getRowCount(input)_ (see
[here|https://github.com/apache/calcite/blob/506950a3ebd4807b36901bededc64f7c60497712/core/src/main/java/org/apache/calcite/rel/SingleRel.java#L66-L69]).
was:
The _SingleRel_ handler in _RelMdRowCount_
([here|https://github.com/apache/calcite/blob/506950a3ebd4807b36901bededc64f7c60497712/core/src/main/java/org/apache/calcite/rel/metadata/RelMdRowCount.java#L194-L197])
always returns the input's row count, ignoring any _estimateRowCount()_
override in subclasses:
{code:java}
public @Nullable Double getRowCount(SingleRel rel, RelMetadataQuery mq) {
return mq.getRowCount(rel.getInput());
}
{code}
This makes it impossible for custom _SingleRel_ operators to provide accurate
row count estimates without implementing a custom metadata handler.
The _RelNode_ catch-all handler
([here|https://github.com/apache/calcite/blob/506950a3ebd4807b36901bededc64f7c60497712/core/src/main/java/org/apache/calcite/rel/metadata/RelMdRowCount.java#L64-L72])
correctly delegates to {_}estimateRowCount(){_}:
{code:java}
public @Nullable Double getRowCount(RelNode rel, RelMetadataQuery mq) {
return rel.estimateRowCount(mq);
}
{code}
The _SingleRel_ handler should do the same for consistency.
Reproducer (for RelMetadataTest test file):
{code:java}
private static class ExpandingRel extends SingleRel {
private static final double EXPANSION_FACTOR = 10.0;
ExpandingRel(RelOptCluster cluster, RelTraitSet traits, RelNode input) {
super(cluster, traits, input);
}
@Override public double estimateRowCount(RelMetadataQuery mq) {
return mq.getRowCount(input) * EXPANSION_FACTOR;
}
}
@Test void testRowCountCustomSingleRel() {
final RelNode scan = sql("select * from emp").toRel();
final ExpandingRel expanding =
new ExpandingRel(scan.getCluster(), scan.getTraitSet(), scan);
final RelMetadataQuery mq = scan.getCluster().getMetadataQuery();
final Double rowCount = mq.getRowCount(expanding);
// Returns 14.0 (input row count) instead of 140.0 (input * 10)
assertThat(rowCount, is(EMP_SIZE * 10));
}
{code}
Fix:
Change the _SingleRel_ handler to delegate to {_}estimateRowCount(){_}:
{code:java}
public @Nullable Double getRowCount(SingleRel rel, RelMetadataQuery mq) {
return rel.estimateRowCount(mq);
}{code}
This is backward compatible since _SingleRel.estimateRowCount()_ already
returns _mq.getRowCount(input)_ (see
[here|https://github.com/apache/calcite/blob/506950a3ebd4807b36901bededc64f7c60497712/core/src/main/java/org/apache/calcite/rel/SingleRel.java#L66-L69]).
> RelMdRowCount ignores estimateRowCount() overrides in SingleRel's subclasses
> ----------------------------------------------------------------------------
>
> Key: CALCITE-7365
> URL: https://issues.apache.org/jira/browse/CALCITE-7365
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.41.0
> Reporter: Alessandro Solimando
> Assignee: Alessandro Solimando
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The _SingleRel_ handler in _RelMdRowCount_
> ([here|https://github.com/apache/calcite/blob/506950a3ebd4807b36901bededc64f7c60497712/core/src/main/java/org/apache/calcite/rel/metadata/RelMdRowCount.java#L194-L197])
> always returns the input's row count, ignoring any _estimateRowCount()_
> override in subclasses:
> {code:java}
> public @Nullable Double getRowCount(SingleRel rel, RelMetadataQuery mq) {
> return mq.getRowCount(rel.getInput());
> }
> {code}
> This makes it impossible for custom _SingleRel_ operators to provide accurate
> row count estimates without implementing a custom metadata handler.
> The _RelNode_ catch-all handler
> ([here|https://github.com/apache/calcite/blob/506950a3ebd4807b36901bededc64f7c60497712/core/src/main/java/org/apache/calcite/rel/metadata/RelMdRowCount.java#L64-L72])
> correctly delegates to {_}estimateRowCount(){_}:
> {code:java}
> public @Nullable Double getRowCount(RelNode rel, RelMetadataQuery mq) {
> return rel.estimateRowCount(mq);
> }
> {code}
> The _SingleRel_ handler should do the same for consistency.
> Reproducer (for RelMetadataTest test file):
> {code:java}
> private static class ExpandingRel extends SingleRel {
> private static final double EXPANSION_FACTOR = 10.0;
> ExpandingRel(RelOptCluster cluster, RelTraitSet traits, RelNode input) {
> super(cluster, traits, input);
> }
> @Override public double estimateRowCount(RelMetadataQuery mq) {
> return mq.getRowCount(input) * EXPANSION_FACTOR;
> }
> }
> @Test void testRowCountCustomSingleRel() {
> final RelNode scan = sql("select * from emp").toRel();
> final ExpandingRel expanding =
> new ExpandingRel(scan.getCluster(), scan.getTraitSet(), scan);
> final RelMetadataQuery mq = scan.getCluster().getMetadataQuery();
> final Double rowCount = mq.getRowCount(expanding);
> // Returns 14.0 (input row count) instead of 140.0 (input * 10)
> assertThat(rowCount, is(EMP_SIZE * 10));
> }
> {code}
> Fix:
> Change the _SingleRel_ handler to delegate to {_}estimateRowCount(){_}:
> {code:java}
> public @Nullable Double getRowCount(SingleRel rel, RelMetadataQuery mq) {
> return rel.estimateRowCount(mq);
> }{code}
> This is backward compatible since _SingleRel.estimateRowCount()_ already
> returns _mq.getRowCount(input)_ (see
> [here|https://github.com/apache/calcite/blob/506950a3ebd4807b36901bededc64f7c60497712/core/src/main/java/org/apache/calcite/rel/SingleRel.java#L66-L69]).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)