[ https://issues.apache.org/jira/browse/HIVE-27163?focusedWorklogId=859117&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859117 ]
ASF GitHub Bot logged work on HIVE-27163: ----------------------------------------- Author: ASF GitHub Bot Created on: 26/Apr/23 09:19 Start Date: 26/Apr/23 09:19 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on code in PR #4228: URL: https://github.com/apache/hive/pull/4228#discussion_r1177588895 ########## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java: ########## @@ -508,6 +511,48 @@ public static void clearQuickStats(Map<String, String> params) { params.remove(StatsSetupConst.NUM_ERASURE_CODED_FILES); } + public static void updateTableStatsForCreateTable(Warehouse wh, Database db, Table tbl, + EnvironmentContext envContext, Configuration conf, Path tblPath, boolean newDir) + throws MetaException { + // If the created table is a view, skip generating the stats + if (MetaStoreUtils.isView(tbl)) { + return; + } + assert tblPath != null; + if (tbl.isSetDictionary() && tbl.getDictionary().getValues() != null) { + List<java.nio.ByteBuffer> values = tbl.getDictionary().getValues(). + remove(StatsSetupConst.STATS_FOR_CREATE_TABLE); + java.nio.ByteBuffer buffer; + if (values != null && values.size() > 0 && (buffer = values.get(0)).hasArray()) { + String val = new String(buffer.array(), StandardCharsets.UTF_8); + if (StatsSetupConst.TRUE.equals(val)) { + try { + boolean isIcebergTable = + HiveMetaHook.ICEBERG.equalsIgnoreCase(tbl.getParameters().get(HiveMetaHook.TABLE_TYPE)); + PathFilter pathFilter = isIcebergTable ? + path -> !"metadata".equals(path.getName()) : FileUtils.HIDDEN_FILES_PATH_FILTER; Review Comment: Move this part to `Table#isIcebergTable` on the client, https://github.com/apache/hive/pull/4228/files#diff-a88cd54666ea2466fd0c0f1323efecdc0be83a3fbf726d471f94080e31affc15 Issue Time Tracking ------------------- Worklog Id: (was: 859117) Time Spent: 2h 10m (was: 2h) > Column stats are not getting published after an insert query into an external > table with custom location > -------------------------------------------------------------------------------------------------------- > > Key: HIVE-27163 > URL: https://issues.apache.org/jira/browse/HIVE-27163 > Project: Hive > Issue Type: Bug > Components: Hive > Reporter: Taraka Rama Rao Lethavadla > Assignee: Zhihua Deng > Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > Test case details are below > *test.q* > {noformat} > set hive.stats.column.autogather=true; > set hive.stats.autogather=true; > dfs ${system:test.dfs.mkdir} ${system:test.tmp.dir}/test; > create external table test_custom(age int, name string) stored as orc > location '/tmp/test'; > insert into test_custom select 1, 'test'; > desc formatted test_custom age;{noformat} > *test.q.out* > > > {noformat} > #### A masked pattern was here #### > PREHOOK: type: CREATETABLE > #### A masked pattern was here #### > PREHOOK: Output: database:default > PREHOOK: Output: default@test_custom > #### A masked pattern was here #### > POSTHOOK: type: CREATETABLE > #### A masked pattern was here #### > POSTHOOK: Output: database:default > POSTHOOK: Output: default@test_custom > PREHOOK: query: insert into test_custom select 1, 'test' > PREHOOK: type: QUERY > PREHOOK: Input: _dummy_database@_dummy_table > PREHOOK: Output: default@test_custom > POSTHOOK: query: insert into test_custom select 1, 'test' > POSTHOOK: type: QUERY > POSTHOOK: Input: _dummy_database@_dummy_table > POSTHOOK: Output: default@test_custom > POSTHOOK: Lineage: test_custom.age SIMPLE [] > POSTHOOK: Lineage: test_custom.name SIMPLE [] > PREHOOK: query: desc formatted test_custom age > PREHOOK: type: DESCTABLE > PREHOOK: Input: default@test_custom > POSTHOOK: query: desc formatted test_custom age > POSTHOOK: type: DESCTABLE > POSTHOOK: Input: default@test_custom > col_name age > data_type int > min > max > num_nulls > distinct_count > avg_col_len > max_col_len > num_trues > num_falses > bit_vector > comment from deserializer{noformat} > As we can see from desc formatted output, column stats were not populated > -- This message was sent by Atlassian Jira (v8.20.10#820010)