> On March 15, 2017, 6:24 p.m., pengcheng xiong wrote: > > metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java > > Line 27 (original), 27 (patched) > > <https://reviews.apache.org/r/57566/diff/4/?file=1665117#file1665117line27> > > > > I have a general question to understand the problem that you would like > > to resolve. I tried on current master. If there are two tables a and b and > > I rename a to b, I can see the error but a's table and column stats are not > > affected. If i change a's location to an existing one, its column and table > > stats disappeared. This is expected as we are now using the new location. > > Could u be talk more about how to reproduce the problem? Thanks.
Hi, PengCheng, thanks for reviewing the patch. The current logic to rename a managed table is as following: 1. make changes to table metadata (e.g. table name, data location, and partition data location if it has partitions) and column stats (since TAB_COL_STAT table has the table_name feild), then commit the transaction. 2. if transaction commit succeeds, alter table then moves data to the renamed table location, otherwise the transaction is rolled back, therefore no changes happen in backend DB. {code} if (!success) { msdb.rollbackTransaction(); } if (success && moveData) { // change the file name in hdfs // check that src exists otherwise there is no need to copy the data // rename the src to destination try { if (srcFs.exists(srcPath) && !srcFs.rename(srcPath, destPath)) { throw new IOException("Renaming " + srcPath + " to " + destPath + " failed"); } } {code} 3. if move data fails, for example, srcFs.exists(srcPath) && !srcFs.rename(srcPath, destPath) throws IOException for encryption zone incompatibility issue etc, the operation starts another transaction to revert the DB changes which have been committed in step 1. {code} msdb.openTransaction(); msdb.alterTable(newt.getDbName(), newt.getTableName(), oldt); for (ObjectPair<Partition, String> pair : altps) { Partition part = pair.getFirst(); part.getSd().setLocation(pair.getSecond()); msdb.alterPartition(newt.getDbName(), name, part.getValues(), part); } revertMetaDataTransaction = msdb.commitTransaction(); {code} But you might notice that the line of code msdb.alterTable(newt.getDbName(), newt.getTableName(), oldt) just reverts the Hive table metadata (e.g. table name in TBLS, location in SD), but not the table name in TAB_COL_STATS. It causes two consequences: a) you could no more get the table col stats because the entries in TAB_COL_STATS now have a different table name, the one you used in the failed rename; b) you even can not drop this Hive table, because these entries in TAB_COL_STATS can not deleted due to mismatched table name but they still hold a reference (TBL_ID) to this Hive table in TBLS. How to fix this issue: We might add the logic to revert the change to TAB_COL_STATS in step 3. But it is not an ideal approach since the reverting could fail as well. In additon, I think the logic alter table is a little convulted. A simple way might be a). first move data for the altered table; b). if moving succeeds, commit the transaction for DB changes (e.g. table metadata and column stats), otherwise roll back the transaction. c) if the transaction is rolled back but data has been moved, moving the data back. Reverting data location is easier than reverting metadata changes in DB given so many backend tables are involved. It might be easier to reproduce the issues using encryption_move_tbl.q which can be run using command "mvn test -Dtest=TestEncryptedHDFSCliDriver -Dqfile=encryption_move_tbl.q". You need modify the encryption_move_tbl.q to compute the columns stats and describe these stats for table encrypted_table: {code}INSERT OVERWRITE TABLE encrypted_table SELECT * FROM src; SHOW TABLES; ANALYZE TABLE encrypted_table COMPUTE STATISTICS FOR COLUMNS; DESCRIBE FORMATTED encrypted_table key; DESCRIBE FORMATTED encrypted_table value; -- should fail, since they are in different encryption zones, but table columns statistics should not change ALTER TABLE default.encrypted_table RENAME TO encrypted_db.encrypted_table_2; SHOW TABLES; DESCRIBE FORMATTED encrypted_table key; DESCRIBE FORMATTED encrypted_table value; {code} You will see that columns stats are all gone after the failed alter table due to EZ incompatibility. You could even not drop table encrypted_table because of the corrupted column stats in TAB_COL_STATS. > On March 15, 2017, 6:24 p.m., pengcheng xiong wrote: > > metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java > > Line 311 (original) > > <https://reviews.apache.org/r/57566/diff/4/?file=1665117#file1665117line318> > > > > The deleted lines are dealing with the partition table. Is it safe to > > do so? This line of code is to revert the DB changes to a partitioned table if moving data fails. We do not need this in new code because the DB transaction either commits or rolls back, and the 2nd transaction to revert the DB changes is no more needed. - Chaoyu ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/57566/#review169037 ----------------------------------------------------------- On March 15, 2017, 1:58 a.m., Chaoyu Tang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/57566/ > ----------------------------------------------------------- > > (Updated March 15, 2017, 1:58 a.m.) > > > Review request for hive. > > > Bugs: HIVE-16189 > https://issues.apache.org/jira/browse/HIVE-16189 > > > Repository: hive-git > > > Description > ------- > > If the table rename does not succeed due to its failure in moving the data to > the new renamed table folder, the changes in TAB_COL_STATS are not rolled > back which leads to invalid column stats. > > This patch changes the order of metadata update and data move in alter table > rename operation, which makes it easier to roll back metadata changes when > moving data fails in rename a table. > > > Diffs > ----- > > metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java > bae39ac > ql/src/test/queries/clientpositive/encryption_move_tbl.q 7a5de7b > ql/src/test/results/clientpositive/encrypted/encryption_move_tbl.q.out > cc363ac > > > Diff: https://reviews.apache.org/r/57566/diff/4/ > > > Testing > ------- > > > Thanks, > > Chaoyu Tang > >