[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531204#comment-14531204 ]
Dongwook Kwon commented on HIVE-10631: -------------------------------------- If the intention of line 1363(MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true)) is updating fast stats regardless of the fact the table is just created, (which means no MSCK REPAIR PARTITIONS, even if it's existing external table, I don't understand the reason why it tries to update stats before metastore know about partitions, this part, I still don't understand, however if this was the intention of HIVE-3959), Then I believe line 1363 should be something like below {code} FileStatus[] fileStatus = wh.getFileStatusesForUnpartitionedTable(db, tbl); MetaStoreUtils.updateUnpartitionedTableStatsFast(tbl, fileStatus, fileStatus.length == 0, false); {code} Otherwise it should be like the this, at least not to scan folders for unnecessary operation. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(tbl, null, true, false); {code} Just my thought. > create_table_core method has invalid update for Fast Stats > ---------------------------------------------------------- > > Key: HIVE-10631 > URL: https://issues.apache.org/jira/browse/HIVE-10631 > Project: Hive > Issue Type: Bug > Components: Metastore > Affects Versions: 0.13.0, 1.0.0 > Reporter: Dongwook Kwon > Priority: Minor > > HiveMetaStore.create_table_core method calls > MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather > is on, however for partitioned table, this updateUnpartitionedTableStatsFast > call scanning warehouse dir and doesn't seem to use it. > "Fast Stats" was implemented by HIVE-3959 > https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 > From create_table_core method > {code} > if (HiveConf.getBoolVar(hiveConf, > HiveConf.ConfVars.HIVESTATSAUTOGATHER) && > !MetaStoreUtils.isView(tbl)) { > if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table > MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, > madeDir); > } else { // Partitioned table with no partitions. > MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, > true); > } > } > {code} > Particularly Line 1363: // Partitioned table with no partitions. > {code} > MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); > {code} > This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and > do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to > newDir flag is always true > Impact of this bug is minor with HDFS warehouse > location(hive.metastore.warehouse.dir), it could be big with S3 warehouse > location especially for large existing partitions. > Also the impact is heighten with HIVE-6727 when warehouse location is S3, > basically it could scan wrong S3 directory recursively and do nothing with > it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)