RE: ORA-8177 with Hive transactions

Mich Talebzadeh Thu, 24 Sep 2015 13:28:51 -0700

Yes I came across this back in April trying to load 1.7 million rows from an 
RDBMS via SAP replication server into Hive


 

My notes were

 

“Trying to sync a table from ASE--> RS -->  to Hadoop via DIRECT LOAD. The 
source has 1.7 million rows and is populating Hive table. However, I only get 
around 50K rows in Hive table before MapReduce jobs gives up and get killed.

 

I have turned on concurrency and use an Oracle database as metastore. Data I 
believe is delivered in bulk through files show as rs_temp__nnn below. I 
thought that by turning concurrency on in Hive, I would have resolved the 
problem.

 

2015-04-16 17:04:34,773 WARN  [pool-3-thread-199]: txn.TxnHandler 
(TxnHandler.java:detectDeadlock(928)) - Deadlock detected in unlock, trying 
again.

2015-04-16 17:04:34,784 INFO  [pool-3-thread-197]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(713)) - 158: source:127.0.0.1 get_table : 
db=asehadoop tbl=rs_temp__0x2aaab81d6ab0_t

2015-04-16 17:04:34,784 INFO  [pool-3-thread-197]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(339)) - ugi=hduser     ip=127.0.0.1    
cmd=source:127.0.0.1 get_table : db=asehadoop tbl=rs_temp__0x2aaab81d6ab0_t

2015-04-16 17:04:34,785 WARN  [pool-3-thread-199]: txn.TxnHandler 
(TxnHandler.java:detectDeadlock(928)) - Deadlock detected in unlock, trying 
again.

2015-04-16 17:04:34,798 WARN  [pool-3-thread-154]: txn.TxnHandler 
(TxnHandler.java:detectDeadlock(928)) - Deadlock detected in unlock, trying 
again.

2015-04-16 17:04:34,799 WARN  [pool-3-thread-199]: txn.TxnHandler 
(TxnHandler.java:detectDeadlock(928)) - Deadlock detected in unlock, trying 
again.

2015-04-16 17:04:34,808 INFO  [pool-3-thread-198]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(713)) - 162: source:127.0.0.1 get_table : 
db=asehadoop tbl=rs_temp__0x2aaab804eda0_t

2015-04-16 17:04:34,809 INFO  [pool-3-thread-198]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(339)) - ugi=hduser     ip=127.0.0.1    
cmd=source:127.0.0.1 get_table : db=asehadoop tbl=rs_temp__0x2aaab804eda0_t

2015-04-16 17:04:34,810 WARN  [pool-3-thread-199]: txn.TxnHandler 
(TxnHandler.java:detectDeadlock(928)) - Deadlock detected in unlock, trying 
again.

2015-04-16 17:04:34,813 WARN  [pool-3-thread-154]: txn.TxnHandler 
(TxnHandler.java:detectDeadlock(928)) - Deadlock detected in unlock, trying 
again.

2015-04-16 17:04:34,827 ERROR [pool-3-thread-199]: txn.TxnHandler 
(TxnHandler.java:detectDeadlock(931)) - Too many repeated deadlocks in unlock, 
giving up.

2015-04-16 17:04:34,835 WARN  [pool-3-thread-154]: txn.TxnHandler 
(TxnHandler.java:detectDeadlock(928)) - Deadlock detected in unlock, trying 
again.

2015-04-16 17:04:34,839 ERROR [pool-3-thread-199]: metastore.RetryingHMSHandler 
(RetryingHMSHandler.java:invoke(141)) - org.apache.thrift.TException: 

MetaException(message:Unable to update transaction database 
java.sql.SQLException: ORA-08177: can't serialize access for this transaction

 

 

Now that ORA-08177 means that the transaction ordering of bulk data from RS . 
According to the docs, ORA-08177 can be caused only by serializable 
transactions. It means that a row which the serializable transaction is trying 
to modify was modified by another transaction after the serializable 
transaction has begun. “

 

After a couple of weeks I came up with the following approach

 

“OK guys,

 

Some good news

 

Sounds like setting these two parameters helps!

 

1.     --

-- Parameter "mat_load_tran_size", Default: 10000, specifies the optimal 
transaction size or batch size for the initial copying of primary data to the 
replicate table during direct load materialization.

alter connection to hiveserver2.asehadoop set mat_load_tran_size to "50000"

go

 

-- Parameter "max_mat_load_threads", Default: 5, specifies the maximum number 
of load threads for each table being materialized.

alter connection to hiveserver2.asehadoop set max_mat_load_threads to "1"

 

Makes things work without falling over

 

2.     Need to have concurrency enabled in Hive metastore. Mine is on Oracle. 
You need to run separate sql against it one labelled like 
hive-txn-schema-0.14.0.oracle.sql after the basic one 
hive-schema-0.14.0.oracle.sql

3.     Make sure that concurrency is enabled in Hive. hive.support.concurrency 
<https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.support.concurrency>
   is false by default

4.     Once concurrency in Hive is enabled, you need to install and run Apache 
zookeeper <https://zookeeper.apache.org/>  for distributed lock management 
otherwise you are going to encounter deadlock or serialisation issues in your 
metadata as below

2015-04-16 17:04:34,785 WARN  [pool-3-thread-199]: txn.TxnHandler 
(TxnHandler.java:detectDeadlock(928)) - Deadlock detected in unlock, trying 
again.

2015-04-16 17:04:34,798 WARN  [pool-3-thread-154]: txn.TxnHandler 
(TxnHandler.java:detectDeadlock(928)) - Deadlock detected in unlock, trying 
again.

2015-04-16 17:04:34,799 WARN  [pool-3-thread-199]: txn.TxnHandler 
(TxnHandler.java:detectDeadlock(928)) - Deadlock detected in unlock, trying 
again.

2015-04-16 17:04:34,808 INFO  [pool-3-thread-198]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(713)) - 162: source:127.0.0.1 get_table : 
db=asehadoop tbl=rs_temp__0x2aaab804eda0_t

2015-04-16 17:04:34,809 INFO  [pool-3-thread-198]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(339)) - ugi=hduser     ip=127.0.0.1    
cmd=source:127.0.0.1 get_table : db=asehadoop tbl=rs_temp__0x2aaab804eda0_t

2015-04-16 17:04:34,810 WARN  [pool-3-thread-199]: txn.TxnHandler 
(TxnHandler.java:detectDeadlock(928)) - Deadlock detected in unlock, trying 
again.

2015-04-16 17:04:34,813 WARN  [pool-3-thread-154]: txn.TxnHandler 
(TxnHandler.java:detectDeadlock(928)) - Deadlock detected in unlock, trying 
again.

2015-04-16 17:04:34,827 ERROR [pool-3-thread-199]: txn.TxnHandler 
(TxnHandler.java:detectDeadlock(931)) - Too many repeated deadlocks in unlock, 
giving up.

2015-04-16 17:04:34,835 WARN  [pool-3-thread-154]: txn.TxnHandler 
(TxnHandler.java:detectDeadlock(928)) - Deadlock detected in unlock, trying 
again.

2015-04-16 17:04:34,839 ERROR [pool-3-thread-199]: metastore.RetryingHMSHandler 
(RetryingHMSHandler.java:invoke(141)) - org.apache.thrift.TException: 

MetaException(message:Unable to update transaction database 
java.sql.SQLException: ORA-08177: can't serialize access for this transaction

 

5.     I believe the best way is to add a searchable column (PK)  to repdef and 
replicate a sub-range of rows by where clause say ½ million rows at a time

 

OK just right now I managed to get ½ million rows into hive pretty quickly.

 

Source table in ASE

 

1> select count(1) from t

2> go

 

-----------

      500000

 

(1 row affected)

 

In Replicate database asehaddp in Hive

 

hive> use asehadoop;

OK

hive> select count(1) from t;

Query ID = hduser_20150429223737_1c0522a4-a7d2-4b71-873a-49e698e4c17a

Starting Job = job_1430341149959_0052, Tracking URL = 
http://rhes564:8088/proxy/application_1430341149959_0052/

Hadoop job information for Stage-1: number of mappers: 5; number of reducers: 1

2015-04-29 22:37:47,737 Stage-1 map = 0%,  reduce = 0%

2015-04-29 22:37:55,028 Stage-1 map = 20%,  reduce = 0%, Cumulative CPU 2.92 sec

2015-04-29 22:37:56,062 Stage-1 map = 60%,  reduce = 0%, Cumulative CPU 8.95 sec

2015-04-29 22:38:01,262 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 11.8 sec

2015-04-29 22:38:02,295 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 13.28 
sec

2015-04-29 22:38:03,335 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
14.52 sec

OK

500000

 

 

I really need to go back and retry these tests

 

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Ltd, its subsidiaries nor their employees accept any 
responsibility.

 

From: Steve Howard [mailto:[email protected]] 
Sent: 24 September 2015 14:53
To: [email protected]
Subject: Re: ORA-8177 with Hive transactions

 

All,

 

We continue to struggle with this.  We *never* get  the lock, and found one 
issue in which the retry logic gets in an infinite loop.  We submitted a JIRA 
for that (https://issues.apache.org/jira/browse/HIVE-11934), and patched our 
version (HDP 2.3, Hive 1.2.1) with a fix in which the deadlockCount variable is 
no longer managed in the lock() method.  That works, but we still couldn't get 
the lock, and the exception was thrown after ten retries.  At least we knew it 
was broken earlier ;)

 

We have made the changes to the HIVE transaction tables to enable 
ROWDEPENDENCIES, but are still plagued with serialization errors that are never 
resolved.

 

We have only a single writer, as the Hive database environment is used as a 
target for an existing EDW dataset.  The job loader is the only one making 
changes.  However, it is for analytics, so we have a lot of readers.

 

We have considered changing the TRANSACTION_SERIALIZABLE for the dbConn() 
method call in the TxnHandler class to READ_COMMITTED, as Oracle provides 
consistent reads.  Of course, the serialization exception is thrown when one 
thread (a read or a writer I guess) attempts to lock a hive table (or in our 
case, several hundred daily hive table partitions) attempts to update the row, 
and another thread has changed and committed it in the meantime.

 

Unless I missing it, this will always be an issue since we have readers and 
writers and each appears to take a lock.

 

If we know we will have a single writer, the largest risk is that the reader 
thinks the data hasn't changed, when it has.  For our needs, that isn't a huge 
issue.

 

Are we missing something?  Any ideas?

 

Thanks,

 

Steve

 

On Fri, Sep 18, 2015 at 3:39 PM, Steve Howard <[email protected] 
<mailto:[email protected]> > wrote:

I think ROWDEPENDENCIES on an Oracle table also covers this issue, so I don't 
think a separate JIRA is needed for the INITRANS change.

 

On Fri, Sep 18, 2015 at 2:51 PM, Sergey Shelukhin <[email protected] 
<mailto:[email protected]> > wrote:

There’s HIVE-11831 <https://issues.apache.org/jira/browse/HIVE-11831>  and 
https://issues.apache.org/jira/browse/HIVE-11833 that try to address this.

We can do a patch similar to the first one; can you file a JIRA?

 

From: Steve Howard <[email protected] <mailto:[email protected]> >
Reply-To: "[email protected] <mailto:[email protected]> " 
<[email protected] <mailto:[email protected]> >
Date: Friday, September 18, 2015 at 10:54
To: "[email protected] <mailto:[email protected]> " <[email protected] 
<mailto:[email protected]> >
Subject: ORA-8177 with Hive transactions

 

While troubleshooting an issue with transactions shortly after enabling them, I 
noticed the following in an Oracle trace, which is our metastore for hive... 

 

ORA-8177: can't serialize access for this transaction

 

These were thrown on "insert into HIVE_LOCKS..."

 

Traditionally in Oracle, if an application actually needs serializable 
transactions, the fix is to to set initrans and maxtrans to the number of 
concurrent writers.  When I ran what is below on a table similar to HIVE_LOCKS, 
this exception was thrown everywhere.  The fix is to recreate the table with 
higher values for initrans (only 1 is the default for initrans, and 255 is the 
default for maxtrans).  When I did this and re-ran what is below, the 
exceptions were no longer thrown.

 

Does anyone have any feedback on this performance hint?  The exceptions in hive 
are thrown from the checkRetryable method in the TxnHandler class, but I 
couldn't find what class.method throws them.  Perhaps the exceptions are not 
impactful, but given the fact the method expects them as it checks for the 
string in the exception message, I thought I would ask for feedback before we 
recreate the HIVE_LOCKS table with a higher value for INITRANS.

 

import java.sql.*;public class testLock implements Runnable {
  public static void main (String[] args) throws Exception {
    Class.forName("oracle.jdbc.driver.OracleDriver");
    for (int i = 1; i <= 100; i++) {
      testLock tl = new testLock();
    }
  }
 
  public testLock() {
    Thread t = new Thread(this);
    t.start();
  }
 
  public void run() {
    try {
      Connection conn = 
DriverManager.getConnection("jdbc:oracle:thin:username/pwd@dbhost:1521/dbservice");
      conn.createStatement().execute("alter session set isolation_level = 
serializable");
      PreparedStatement pst = conn.prepareStatement("update test set a = ?");
      for (int j = 1; j <= 10000; j++) {
        pst.setInt(1,j);
        pst.execute();
        conn.commit();
        System.out.println("worked");
      }
    }
    catch (Exception e) {
      System.out.println(e.getMessage());
    }
  }}

RE: ORA-8177 with Hive transactions

Reply via email to