[ https://issues.apache.org/jira/browse/HIVE-20506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604809#comment-16604809 ]
Sahil Takiar commented on HIVE-20506: ------------------------------------- [~brocknoland] I think you might be hitting the {{SPARK_RPC_CLIENT_HANDSHAKE_TIMEOUT}} timeout. The {{SPARK_RPC_CLIENT_CONNECT_TIMEOUT}} is the timeout for how long SASL negotiation takes between the {{RemoteDriver}} and HS2 (yes I know its a bit confusing). {{SPARK_RPC_CLIENT_HANDSHAKE_TIMEOUT}} is set to 90 seconds by default. So HoS will essentially wait 90 seconds for the Spark application to be submitted. The app has to be submit and accepted by YARN, and the {{RemoteDriver}} has to startup and connect back to HS2 all within 90 seconds. Essentially, if the cluster is busy, HoS will wait 90 seconds for the cluster to free up enough resources for the Spark app the start before issuing a timeout. Is my understanding of your problem correct? I agree we should make the HoS behavior as close to the HoMR behavior as possible. I'm not entirely sure what HoMR does. Is there a timeout for the MapReduce application to be accepted? > HOS times out when cluster is full while Hive-on-MR waits > --------------------------------------------------------- > > Key: HIVE-20506 > URL: https://issues.apache.org/jira/browse/HIVE-20506 > Project: Hive > Issue Type: Improvement > Reporter: Brock Noland > Priority: Major > > My understanding is as follows: > Hive-on-MR when the cluster is full will wait for resources to be available > before submitting a job. This is because the hadoop jar command is the > primary mechanism Hive uses to know if a job is complete or failed. > > Hive-on-Spark will timeout after {{SPARK_RPC_CLIENT_CONNECT_TIMEOUT}} because > the RPC client in the AppMaster doesn't connect back to the RPC Server in > HS2. > This is a behavior difference it'd be great to close. -- This message was sent by Atlassian JIRA (v7.6.3#76005)