Pavan Srinivas created HADOOP-12739:
---------------------------------------

             Summary: Deadlock with OrcInputFormat split threads and Jets3t 
connections, since, NativeS3FileSystem does not release connections with seek()
                 Key: HADOOP-12739
                 URL: https://issues.apache.org/jira/browse/HADOOP-12739
             Project: Hadoop Common
          Issue Type: Bug
            Reporter: Pavan Srinivas


Recently, we came across a deadlock situation with OrcInputFormat while 
computing splits. 

- In Orc, for split computation, it needs file listing and file sizes. 
- Multiple threads are invoked for listing the files and if the data is located 
in S3, NativeS3FileSystem is used. 
- NativeS3FileSystem in turn uses JetS3t Lib to talk to AWS and maintain 
connection pool. 
- When # of threads from OrcInputFormat exceeds JetS3t's max # of connections, 
a deadlock occurs. stack trace: 

{code}
"ORC_GET_SPLITS #5" daemon prio=10 tid=0x00007f8568108800 nid=0x1e29 in 
Object.wait() [0x00007f8565696000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000df9ed450> (a 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
        at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518)
        - locked <0x00000000df9ed450> (a 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
        at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
        at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
        at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
        at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
        at 
org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:370)
        at 
org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestGet(RestStorageService.java:929)
        at 
org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:2007)
        at 
org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:1944)
        at org.jets3t.service.S3Service.getObject(S3Service.java:2625)
        at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieve(Jets3tNativeFileSystemStore.java:254)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at org.apache.hadoop.fs.s3native.$Proxy12.retrieve(Unknown Source)
        at 
org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.reopen(NativeS3FileSystem.java:269)
        - locked <0x00000000db01eec0> (a 
org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream)
        at 
org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.seek(NativeS3FileSystem.java:258)
        - locked <0x00000000db01eec0> (a 
org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream)
        at 
org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:98)
        at 
org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:63)
        - locked <0x00000000db01ee70> (a org.apache.hadoop.fs.FSDataInputStream)
        at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:329)
        at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:292)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:197)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:857)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:747)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

   Locked ownable synchronizers:
        - <0x00000000dae7bcb8> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)

{code}

A complete *jstack* dump of the process is attached with. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to