> On March 15, 2017, 6:24 p.m., pengcheng xiong wrote:
> > metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
> > Line 27 (original), 27 (patched)
> > <https://reviews.apache.org/r/57566/diff/4/?file=1665117#file1665117line27>
> >
> >     I have a general question to understand the problem that you would like 
> > to resolve. I tried on current master. If there are two tables a and b and 
> > I rename a to b, I can see the error but a's table and column stats are not 
> > affected. If i change a's location to an existing one, its column and table 
> > stats disappeared. This is expected as we are now using the new location. 
> > Could u be talk more about how to reproduce the problem? Thanks.

Hi, PengCheng, thanks for reviewing the patch. The current logic to rename a 
managed table is as following:
1. make changes to table metadata (e.g. table name, data location, and 
partition data location if it has partitions) and column stats (since 
TAB_COL_STAT table has the table_name feild), then commit the transaction.
2. if transaction commit succeeds, alter table then moves data to the renamed 
table location, otherwise the transaction is rolled back, therefore no changes 
happen in backend DB.
{code}
      if (!success) {
        msdb.rollbackTransaction();
      }

      if (success && moveData) {
        // change the file name in hdfs
        // check that src exists otherwise there is no need to copy the data
        // rename the src to destination
        try {
          if (srcFs.exists(srcPath) && !srcFs.rename(srcPath, destPath)) {
            throw new IOException("Renaming " + srcPath + " to " + destPath + " 
failed");
          }
        }
{code}
3. if move data fails, for example, srcFs.exists(srcPath) && 
!srcFs.rename(srcPath, destPath) throws IOException for encryption zone 
incompatibility issue etc, the operation starts another transaction to revert 
the DB changes which have been committed in step 1.
{code}
            msdb.openTransaction();
            msdb.alterTable(newt.getDbName(), newt.getTableName(), oldt);
            for (ObjectPair<Partition, String> pair : altps) {
              Partition part = pair.getFirst();
              part.getSd().setLocation(pair.getSecond());
              msdb.alterPartition(newt.getDbName(), name, part.getValues(), 
part);
            }
            revertMetaDataTransaction = msdb.commitTransaction();
{code}
But you might notice that the line of code msdb.alterTable(newt.getDbName(), 
newt.getTableName(), oldt) just reverts the Hive table metadata (e.g. table 
name in TBLS, location in SD), but not the table name in TAB_COL_STATS. It 
causes two consequences: a) you could no more get the table col stats because 
the entries in TAB_COL_STATS now have a different table name, the one you used 
in the failed rename; b) you even can not drop this Hive table, because these 
entries in TAB_COL_STATS can not deleted due to mismatched table name but they 
still hold a reference (TBL_ID) to this Hive table in TBLS.

How to fix this issue:
We might add the logic to revert the change to TAB_COL_STATS in step 3. But it 
is not an ideal approach since the reverting could fail as well. In additon, I 
think the logic alter table is a little convulted. A simple way might be a). 
first move data for the altered table; b). if moving succeeds, commit the 
transaction for DB changes (e.g. table metadata and column stats), otherwise 
roll back the transaction. c) if the transaction is rolled back but data has 
been moved, moving the data back. Reverting data location is easier than 
reverting metadata changes in DB given so many backend tables are involved.

It might be easier to reproduce the issues using encryption_move_tbl.q which 
can be run using command "mvn test -Dtest=TestEncryptedHDFSCliDriver 
-Dqfile=encryption_move_tbl.q". You need modify the encryption_move_tbl.q to 
compute the columns stats and describe these stats for table encrypted_table:
{code}INSERT OVERWRITE TABLE encrypted_table SELECT * FROM src;
SHOW TABLES;
ANALYZE TABLE encrypted_table COMPUTE STATISTICS FOR COLUMNS;
DESCRIBE FORMATTED encrypted_table key;
DESCRIBE FORMATTED encrypted_table value;

-- should fail, since they are in different encryption zones, but table columns 
statistics should not change
ALTER TABLE default.encrypted_table RENAME TO encrypted_db.encrypted_table_2;
SHOW TABLES;
DESCRIBE FORMATTED encrypted_table key;
DESCRIBE FORMATTED encrypted_table value;
{code}
You will see that columns stats are all gone after the failed alter table due 
to EZ incompatibility. You could even not drop table encrypted_table because of 
the corrupted column stats in TAB_COL_STATS.


> On March 15, 2017, 6:24 p.m., pengcheng xiong wrote:
> > metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
> > Line 311 (original)
> > <https://reviews.apache.org/r/57566/diff/4/?file=1665117#file1665117line318>
> >
> >     The deleted lines are dealing with the partition table. Is it safe to 
> > do so?

This line of code is to revert the DB changes to a partitioned table if moving 
data fails. We do not need this in new code because the DB transaction either 
commits or rolls back, and the 2nd transaction to revert the DB changes is no 
more needed.


- Chaoyu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57566/#review169037
-----------------------------------------------------------


On March 15, 2017, 1:58 a.m., Chaoyu Tang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57566/
> -----------------------------------------------------------
> 
> (Updated March 15, 2017, 1:58 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-16189
>     https://issues.apache.org/jira/browse/HIVE-16189
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> If the table rename does not succeed due to its failure in moving the data to 
> the new renamed table folder, the changes in TAB_COL_STATS are not rolled 
> back which leads to invalid column stats.
> 
> This patch changes the order of metadata update and data move in alter table 
> rename operation, which makes it easier to roll back metadata changes when 
> moving data fails in rename a table.
> 
> 
> Diffs
> -----
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java 
> bae39ac 
>   ql/src/test/queries/clientpositive/encryption_move_tbl.q 7a5de7b 
>   ql/src/test/results/clientpositive/encrypted/encryption_move_tbl.q.out 
> cc363ac 
> 
> 
> Diff: https://reviews.apache.org/r/57566/diff/4/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>

Reply via email to