[ https://issues.apache.org/jira/browse/HIVE-16854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16042140#comment-16042140 ]
Rui Li commented on HIVE-16854: ------------------------------- Thanks Xuefu for finding the issue. I'll take a look. > SparkClientFactory is locked too aggressively > --------------------------------------------- > > Key: HIVE-16854 > URL: https://issues.apache.org/jira/browse/HIVE-16854 > Project: Hive > Issue Type: Bug > Components: Spark > Affects Versions: 1.1.0 > Reporter: Xuefu Zhang > Attachments: 15763.jstack > > > Most methods in SparkClientFactory are synchronized on the SparkClientFactory > singleton. However, some methods are very expensive, such as createClient(), > which returns a SparkClientImpl instance. However, creating a SparkClientImpl > instance requires starting a remote driver to connect back to RPCServer. This > process can take a long time such as in case of a busy yarn queue. When this > happens, all pending calls on SparkClientFactory will have to wait for a > long time. > In our case, hive.spark.client.server.connect.timeout is set to 1hr. This > makes some queries waiting for hours before starting. > The current implementation seems pretty much making all remote driver > launches serialized. If one of them takes time, the following ones will have > to wait. > HS2 stacktrace is attached for reference. It's based on earlier version of > Hive, so the line numbers might be slightly off. -- This message was sent by Atlassian JIRA (v6.3.15#6346)