[ https://issues.apache.org/jira/browse/FLINK-37329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Syed Shameerur Rahman updated FLINK-37329: ------------------------------------------ Description: Currently when "table.optimizer.source.report-statistics-enabled" is set to false, The statistics collection is not disabled for all the cases. It was noted that when running Batch workload to read Hive table TPC-DS data set, although "table.optimizer.source.report-statistics-enabled" was set to false, both table and column statistics were being collected. It was hitting the code path: [https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/optimize/program/FlinkRecomputeStatisticsProgram.java#L133] This goes against the configuration description : {code:java} @Documentation.TableOption(execMode = Documentation.ExecMode.BATCH_STREAMING) public static final ConfigOption<Boolean> TABLE_OPTIMIZER_SOURCE_REPORT_STATISTICS_ENABLED = key("table.optimizer.source.report-statistics-enabled") .booleanType() .defaultValue(true) .withDescription( "When it is true, the optimizer will collect and use the statistics from source connectors" + " if the source extends from SupportsStatisticReport and the statistics from catalog is UNKNOWN." + "Default value is true."); {code} was: Currently when "table.optimizer.source.report-statistics-enabled" is set to false, The statistics collection is not disabled for all the cases. It was noted that when running Batch workload to read Hive table TPC-DS data set, although "table.optimizer.source.report-statistics-enabled" was set to false, both table and column statistics were being collected. It was hitting the code path: https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/optimize/program/FlinkRecomputeStatisticsProgram.java#L133 This goes against the configuration description : {code:java} @Documentation.TableOption(execMode = Documentation.ExecMode.BATCH_STREAMING) public static final ConfigOption<Boolean> TABLE_OPTIMIZER_SOURCE_REPORT_STATISTICS_ENABLED = key("table.optimizer.source.report-statistics-enabled") .booleanType() .defaultValue(true) .withDescription( "When it is true, the optimizer will collect and use the statistics from source connectors" + " if the source extends from SupportsStatisticReport and the statistics from catalog is UNKNOWN." + "Default value is true."); {code} > Skip Source Stats Collection When > "table.optimizer.source.report-statistics-enabled" is false > --------------------------------------------------------------------------------------------- > > Key: FLINK-37329 > URL: https://issues.apache.org/jira/browse/FLINK-37329 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner > Reporter: Syed Shameerur Rahman > Priority: Major > Labels: pull-request-available > > Currently when "table.optimizer.source.report-statistics-enabled" is set to > false, The statistics collection is not disabled for all the cases. It was > noted that when running Batch workload to read Hive table TPC-DS data set, > although "table.optimizer.source.report-statistics-enabled" was set to > false, both table and column statistics were being collected. > It was hitting the code path: > [https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/optimize/program/FlinkRecomputeStatisticsProgram.java#L133] > This goes against the configuration description : > {code:java} > @Documentation.TableOption(execMode = Documentation.ExecMode.BATCH_STREAMING) > public static final ConfigOption<Boolean> > TABLE_OPTIMIZER_SOURCE_REPORT_STATISTICS_ENABLED = > key("table.optimizer.source.report-statistics-enabled") > .booleanType() > .defaultValue(true) > .withDescription( > "When it is true, the optimizer will collect and use > the statistics from source connectors" > + " if the source extends from > SupportsStatisticReport and the statistics from catalog is UNKNOWN." > + "Default value is true."); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)