[ https://issues.apache.org/jira/browse/HIVE-16854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xuefu Zhang updated HIVE-16854: ------------------------------- Status: Patch Available (was: Open) > SparkClientFactory is locked too aggressively > --------------------------------------------- > > Key: HIVE-16854 > URL: https://issues.apache.org/jira/browse/HIVE-16854 > Project: Hive > Issue Type: Bug > Components: Spark > Affects Versions: 1.1.0 > Reporter: Xuefu Zhang > Assignee: Rui Li > Attachments: 15763.jstack, HIVE-16854.patch > > > Most methods in SparkClientFactory are synchronized on the SparkClientFactory > singleton. However, some methods are very expensive, such as createClient(), > which returns a SparkClientImpl instance. However, creating a SparkClientImpl > instance requires starting a remote driver to connect back to RPCServer. This > process can take a long time such as in case of a busy yarn queue. When this > happens, all pending calls on SparkClientFactory will have to wait for a > long time. > In our case, hive.spark.client.server.connect.timeout is set to 1hr. This > makes some queries waiting for hours before starting. > The current implementation seems pretty much making all remote driver > launches serialized. If one of them takes time, the following ones will have > to wait. > HS2 stacktrace is attached for reference. It's based on earlier version of > Hive, so the line numbers might be slightly off. The following shows the > locking effect: > {code} > xuefu@hadoopservice20-sjc1:~$ grep > org.apache.hive.spark.client.SparkClientFactory 15763.jstack > at > org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:79) > - waiting to lock <0x00007f78fa1a9cc0> (a java.lang.Class for > org.apache.hive.spark.client.SparkClientFactory) > at > org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:79) > - waiting to lock <0x00007f78fa1a9cc0> (a java.lang.Class for > org.apache.hive.spark.client.SparkClientFactory) > at > org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80) > - locked <0x00007f78fa1a9cc0> (a java.lang.Class for > org.apache.hive.spark.client.SparkClientFactory) > at > org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:79) > - waiting to lock <0x00007f78fa1a9cc0> (a java.lang.Class for > org.apache.hive.spark.client.SparkClientFactory) > at > org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:79) > - waiting to lock <0x00007f78fa1a9cc0> (a java.lang.Class for > org.apache.hive.spark.client.SparkClientFactory) > at > org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:79) > - waiting to lock <0x00007f78fa1a9cc0> (a java.lang.Class for > org.apache.hive.spark.client.SparkClientFactory) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)