Marta Kuczora created HIVE-22336:
------------------------------------

             Summary: The updates should be pushed to the Metastore backend DB 
before creating the notification event
                 Key: HIVE-22336
                 URL: https://issues.apache.org/jira/browse/HIVE-22336
             Project: Hive
          Issue Type: Bug
          Components: Metastore
    Affects Versions: 4.0.0
            Reporter: Marta Kuczora


There was an issue on HDP-3.1 where a table couldn't be deleted, because some 
related objects (like storage descriptor) were missing from the metastore. 
There was a previous delete attempt on that table which went wrong, but no 
rollback happened, that's why the SD were missing. In that previous delete, the 
notification creation swallowed the error which came from the backend DB, 
that's why no rollback happened. Here are the steps which happened in the first 
delete attempt:

 
# Open a transaction (transaction_1) - this step was successful
# Delete all the objects which are related to the table - this step was 
successful too, so the SD and other objects were deleted
# Delete the table - this step failed in the backend DB, but according to the 
log the delete happens in a batch statement, so it won't necessarily be 
executed right at this moment, so we won't see an error here
# Create a notification about the table delete:
## Open an other transaction for the notification creation (transaction_2) - 
call the ObjectStore.openTransaction method which increases a counter for open 
transactions and then checks if there is already an active transaction. If 
there is, then just returns true and doesn't really create a new transaction.
## Lock the notification id in the metastore backend db for update - here is 
where the exception from the backend DB (let's call it "MySQL Exception") 
manifests
## If an exception occurs during acquiring the log, retry - The "MySQL 
Exception" was caught and since there is no check on the exception, the retry 
mechanism thinks that it happened because couldn't acquire the log for the 
notification id, so retries and "forgot" about the "MySQL Exception".
## If the lock was acquired successfully, create the notification - Second 
time, the lock was acquired successfully, so the notification creation was 
successful.
## Commit transaction_2 - Just decrease the transaction counter, but doesn't 
actually commits anything.
# Commit transaction_1 - This commits the transaction, but since the error 
already got manifested and kind of "handled", here we won't see any error, just 
that the commit was successful, so no rollback happens and leaves the table 
object in an invalid state.
# If the commit was not successful then rollback

In the customer setup, this issue could be fixed by adding a flush call before 
creating the notification event, so all the updates would be pushed to the 
backend db and the error would manifest at this point. With this, the error 
would go back to the HiveMetastore class which would do the rollback and the 
delete table operation would fail as it should be, since the table couldn't be 
deleted. But then the Hivemetastore retry mechanism could try the table 
deletion again.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to