[ 
https://issues.apache.org/jira/browse/FLINK-37329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated FLINK-37329:
------------------------------------------
    Description: 
Currently when "table.optimizer.source.report-statistics-enabled" is set to 
false, The statistics collection is not disabled for all the cases. It was 
noted that when running Batch workload to read Hive table TPC-DS data set, 
although  "table.optimizer.source.report-statistics-enabled" was set to false, 
both table and column statistics were being collected.

It was hitting the code path: 
[https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/optimize/program/FlinkRecomputeStatisticsProgram.java#L133]

This goes against the configuration description :
{code:java}
@Documentation.TableOption(execMode = Documentation.ExecMode.BATCH_STREAMING)
public static final ConfigOption<Boolean> 
TABLE_OPTIMIZER_SOURCE_REPORT_STATISTICS_ENABLED =
        key("table.optimizer.source.report-statistics-enabled")
                .booleanType()
                .defaultValue(true)
                .withDescription(
                        "When it is true, the optimizer will collect and use 
the statistics from source connectors"
                                + " if the source extends from 
SupportsStatisticReport and the statistics from catalog is UNKNOWN."
                                + "Default value is true."); {code}

  was:
Currently when "table.optimizer.source.report-statistics-enabled" is set to 
false, The statistics collection is not disabled for all the cases. It was 
noted that when running Batch workload to read Hive table TPC-DS data set, 
although  "table.optimizer.source.report-statistics-enabled" was set to false, 
both table and column statistics were being collected.

 

It was hitting the code path: 
https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/optimize/program/FlinkRecomputeStatisticsProgram.java#L133

 

This goes against the configuration description :
{code:java}
@Documentation.TableOption(execMode = Documentation.ExecMode.BATCH_STREAMING)
public static final ConfigOption<Boolean> 
TABLE_OPTIMIZER_SOURCE_REPORT_STATISTICS_ENABLED =
        key("table.optimizer.source.report-statistics-enabled")
                .booleanType()
                .defaultValue(true)
                .withDescription(
                        "When it is true, the optimizer will collect and use 
the statistics from source connectors"
                                + " if the source extends from 
SupportsStatisticReport and the statistics from catalog is UNKNOWN."
                                + "Default value is true."); {code}


> Skip Source Stats Collection When 
> "table.optimizer.source.report-statistics-enabled" is false
> ---------------------------------------------------------------------------------------------
>
>                 Key: FLINK-37329
>                 URL: https://issues.apache.org/jira/browse/FLINK-37329
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Planner
>            Reporter: Syed Shameerur Rahman
>            Priority: Major
>              Labels: pull-request-available
>
> Currently when "table.optimizer.source.report-statistics-enabled" is set to 
> false, The statistics collection is not disabled for all the cases. It was 
> noted that when running Batch workload to read Hive table TPC-DS data set, 
> although  "table.optimizer.source.report-statistics-enabled" was set to 
> false, both table and column statistics were being collected.
> It was hitting the code path: 
> [https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/optimize/program/FlinkRecomputeStatisticsProgram.java#L133]
> This goes against the configuration description :
> {code:java}
> @Documentation.TableOption(execMode = Documentation.ExecMode.BATCH_STREAMING)
> public static final ConfigOption<Boolean> 
> TABLE_OPTIMIZER_SOURCE_REPORT_STATISTICS_ENABLED =
>         key("table.optimizer.source.report-statistics-enabled")
>                 .booleanType()
>                 .defaultValue(true)
>                 .withDescription(
>                         "When it is true, the optimizer will collect and use 
> the statistics from source connectors"
>                                 + " if the source extends from 
> SupportsStatisticReport and the statistics from catalog is UNKNOWN."
>                                 + "Default value is true."); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to