[ https://issues.apache.org/jira/browse/HIVE-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537407#comment-13537407 ]
Kevin Wilfong commented on HIVE-3826: ------------------------------------- https://reviews.facebook.net/D7539 > Rollbacks and retries of drops cause > org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database > row) > --------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-3826 > URL: https://issues.apache.org/jira/browse/HIVE-3826 > Project: Hive > Issue Type: Bug > Components: Metastore > Affects Versions: 0.11 > Reporter: Kevin Wilfong > Assignee: Kevin Wilfong > Attachments: HIVE-3826.1.patch.txt > > > I'm not sure if this is the only cause of the exception > "org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database > row)" from the metastore, but one cause seems to be related to a drop command > failing, and being retried by the client. > Based on focusing on a single thread in the metastore with DEBUG level > logging, I was seeing the objects that were intended to be dropped remaining > in the PersistenceManager cache even after a rollback. The steps seemed to > be as follows: > 1) First attempt to drop the table, the table is pulled into the > PersistenceManager cache for the purposes of dropping > 2) The drop fails, e.g. due to a lock wait timeout on the SQL backend, this > causes a rollback of the transaction > 3) The drop is retried using a different thread on the metastore Thrift > server or a different server and succeeds > 4) Back on the original thread of the original Thrift server someone tries to > perform some write operation which produces a commit. This causes those > detached objects related to the dropped table to attempt to reattach, causing > JDO to query the SQL backend for those objects which it can't find. This > causes the exception. > I was able to reproduce this regularly using the following sequence of > commands: > Hive client 1 (Hive1): connected to a metastore Thrift server running a > single thread, I hard coded a RuntimeException into the code to drop a table > in the ObjectStore, specifically right before the commit in > preDropStorageDescriptor, to induce a rollback > Hive client 2 (Hive2): connected to a separate metastore Thrift server > running with standard configs and code > 1: On Hive1, CREATE TABLE t1 (c STRING); > 2: On Hive1, DROP TABLE t1; // This failed due to the hard coded exception > 3: On Hive2, DROP TABLE t1; // Succeeds > 4: On Hive1, CREATE DATABASE d1; // This database already existed, I'm not > sure why this was necessary, but it didn't work without it, it seemed to have > an affect on the order objects were committed in the next step > 5: On Hive1, CREATE DATABASE d2; // This database didn't exist, it would fail > with the NucleusObjectNotFoundException > The object that would cause the exception varied, I saw the MTable, the > MSerDeInfo, and MTablePrivilege from the table that attempted to be dropped. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira