[ https://issues.apache.org/jira/browse/HIVE-27163?focusedWorklogId=860937&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-860937 ]
ASF GitHub Bot logged work on HIVE-27163: ----------------------------------------- Author: ASF GitHub Bot Created on: 08/May/23 08:08 Start Date: 08/May/23 08:08 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on code in PR #4228: URL: https://github.com/apache/hive/pull/4228#discussion_r1187154337 ########## ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CreateTableDesc.java: ########## @@ -921,14 +925,23 @@ public Table toTable(HiveConf conf) throws HiveException { // When replicating the statistics for a table will be obtained from the source. Do not // reset it on replica. if (replicationSpec == null || !replicationSpec.isInReplicationScope()) { - if (!this.isCTAS && (tbl.getPath() == null || (!isExternal() && tbl.isEmpty()))) { - if (!tbl.isPartitioned() && conf.getBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER)) { - StatsSetupConst.setStatsStateForCreateTable(tbl.getTTable().getParameters(), - MetaStoreUtils.getColumnNames(tbl.getCols()), StatsSetupConst.TRUE); - } - } else { - StatsSetupConst.setStatsStateForCreateTable(tbl.getTTable().getParameters(), null, - StatsSetupConst.FALSE); + // Remove COLUMN_STATS_ACCURATE=true from table's parameter, let the HMS determine if + // there is need to add column stats dependent on the table's location. + StatsSetupConst.setStatsStateForCreateTable(tbl.getTTable().getParameters(), null, + StatsSetupConst.FALSE); + if (!this.isCTAS && !tbl.isPartitioned() && !tbl.isTemporary() && + conf.getBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER)) { + // Put the flag into the dictionary in order not to pollute the table, + // ObjectDictionary is meant to convey repeatitive messages. + ObjectDictionary dictionary = tbl.getTTable().isSetDictionary() ? + tbl.getTTable().getDictionary() : new ObjectDictionary(); + List<ByteBuffer> buffers = new ArrayList<>(); + String statsSetup = StatsSetupConst.ColumnStatsSetup.getStatsSetupAsString(true, + tbl.isIcebergTable() ? "metadata" : null, // Skip metadata directory for Iceberg table Review Comment: The `HiveStorageHandler` does not have such API for this purpose, and I'm a little nervous to introduce a new one in `HiveStorageHandler`. Removed the `isIcebergTable()` from the `Table` class, use `storageHandler.isMetadataTableSupported()`(only support Iceberg tables currently) instead. Issue Time Tracking ------------------- Worklog Id: (was: 860937) Time Spent: 4h (was: 3h 50m) > Column stats are not getting published after an insert query into an external > table with custom location > -------------------------------------------------------------------------------------------------------- > > Key: HIVE-27163 > URL: https://issues.apache.org/jira/browse/HIVE-27163 > Project: Hive > Issue Type: Bug > Components: Hive > Reporter: Taraka Rama Rao Lethavadla > Assignee: Zhihua Deng > Priority: Major > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > > Test case details are below > *test.q* > {noformat} > set hive.stats.column.autogather=true; > set hive.stats.autogather=true; > dfs ${system:test.dfs.mkdir} ${system:test.tmp.dir}/test; > create external table test_custom(age int, name string) stored as orc > location '/tmp/test'; > insert into test_custom select 1, 'test'; > desc formatted test_custom age;{noformat} > *test.q.out* > > > {noformat} > #### A masked pattern was here #### > PREHOOK: type: CREATETABLE > #### A masked pattern was here #### > PREHOOK: Output: database:default > PREHOOK: Output: default@test_custom > #### A masked pattern was here #### > POSTHOOK: type: CREATETABLE > #### A masked pattern was here #### > POSTHOOK: Output: database:default > POSTHOOK: Output: default@test_custom > PREHOOK: query: insert into test_custom select 1, 'test' > PREHOOK: type: QUERY > PREHOOK: Input: _dummy_database@_dummy_table > PREHOOK: Output: default@test_custom > POSTHOOK: query: insert into test_custom select 1, 'test' > POSTHOOK: type: QUERY > POSTHOOK: Input: _dummy_database@_dummy_table > POSTHOOK: Output: default@test_custom > POSTHOOK: Lineage: test_custom.age SIMPLE [] > POSTHOOK: Lineage: test_custom.name SIMPLE [] > PREHOOK: query: desc formatted test_custom age > PREHOOK: type: DESCTABLE > PREHOOK: Input: default@test_custom > POSTHOOK: query: desc formatted test_custom age > POSTHOOK: type: DESCTABLE > POSTHOOK: Input: default@test_custom > col_name age > data_type int > min > max > num_nulls > distinct_count > avg_col_len > max_col_len > num_trues > num_falses > bit_vector > comment from deserializer{noformat} > As we can see from desc formatted output, column stats were not populated > -- This message was sent by Atlassian Jira (v8.20.10#820010)