[ https://issues.apache.org/jira/browse/HIVE-26504?focusedWorklogId=809953&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-809953 ]
ASF GitHub Bot logged work on HIVE-26504: ----------------------------------------- Author: ASF GitHub Bot Created on: 19/Sep/22 07:09 Start Date: 19/Sep/22 07:09 Worklog Time Spent: 10m Work Description: veghlaci05 commented on code in PR #3557: URL: https://github.com/apache/hive/pull/3557#discussion_r973924216 ########## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java: ########## @@ -226,16 +229,21 @@ private static ColumnStatistics updateStatsForAlterPart(RawStore rawStore, Table private static void updateStatsForAlterTable(RawStore rawStore, Table tblBefore, Table tblAfter, String catalogName, String dbName, String tableName) throws Exception { ColumnStatistics colStats = null; - List<String> deletedCols = new ArrayList<>(); if (tblBefore.isSetPartitionKeys()) { List<Partition> parts = sharedCache.listCachedPartitions(catalogName, dbName, tableName, -1); for (Partition part : parts) { colStats = updateStatsForAlterPart(rawStore, tblBefore, catalogName, dbName, tableName, part); } } - List<ColumnStatistics> multiColumnStats = HiveAlterHandler - .alterTableUpdateTableColumnStats(rawStore, tblBefore, tblAfter, null, null, rawStore.getConf(), deletedCols); + rawStore.alterTable(catalogName, dbName, tblBefore.getTableName(), tblAfter, null); + + Set<String> deletedCols = new HashSet<>(); + List<ColumnStatistics> multiColumnStats = HiveAlterHandler.getColumnStats(rawStore, tblBefore); + multiColumnStats.forEach(cs -> + deletedCols.addAll(HiveAlterHandler.filterColumnStatsForTableColumns(tblBefore.getSd().getCols(), cs) Review Comment: The `deletedCols.addAll()` call is inside a foreach, so simple assignment is not possible. And yes, it was part of the `alterTableUpdateTableColumnStats`. There was a kind of "dry run" mode in which no changes were made, only the deletedColumns list was filled. I found that approach a bit clunky, as it made the code hard to read by adding a lot of extra if-else statements. So I decided to extract the filtering logic into a separate method which can be called both from here and from `HiveAlterHandler` Issue Time Tracking ------------------- Worklog Id: (was: 809953) Time Spent: 2h 20m (was: 2h 10m) > User is not able to drop table > ------------------------------ > > Key: HIVE-26504 > URL: https://issues.apache.org/jira/browse/HIVE-26504 > Project: Hive > Issue Type: Bug > Components: Hive > Reporter: László Végh > Assignee: László Végh > Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > Hive won't store anything in *TAB_COL_STATS* for partitioned table, whereas > impala stores complete column stats in TAB_COL_STATS for partitioned table. > Deleting entries in TAB_COL_STATS is based on (DB_NAME, TABLE_NAME), not by > TBL_ID. Renamed tables were having old names in TAB_COL_STATS. > To Repro: > {code:java} > beeline: > set hive.create.as.insert.only=false; > set hive.create.as.acid=false; > create table testes.table_name_with_partition (id tinyint, name string) > partitioned by (col_to_partition bigint) stored as parquet; > insert into testes.table_name_with_partition (id, name, col_to_partition) > values (1, "a", 2020), (2, "b", 2021), (3, "c", 2022); > impala: > compute stats testes.table_name_with_partition; -- backend shows new entries > in TAB_COL_STATS > beeline: > alter table testes.table_name_with_partition rename to > testes2.table_that_cant_be_droped; > drop table testes2.table_that_cant_be_droped; -- This fails with > TAB_COL_STATS_fkey constraint violation. > {code} > Exception trace for drop table failure > {code:java} > Caused by: org.postgresql.util.PSQLException: ERROR: update or delete on > table "TBLS" violates foreign key constraint "TAB_COL_STATS_fkey" on table > "TAB_COL_STATS" > Detail: Key (TBL_ID)=(19816) is still referenced from table "TAB_COL_STATS". > at > org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2532) > at > org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2267) > ... 50 more > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)