[ 
https://issues.apache.org/jira/browse/HIVE-20506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604809#comment-16604809
 ] 

Sahil Takiar commented on HIVE-20506:
-------------------------------------

[~brocknoland] I think you might be hitting the 
{{SPARK_RPC_CLIENT_HANDSHAKE_TIMEOUT}} timeout. The 
{{SPARK_RPC_CLIENT_CONNECT_TIMEOUT}} is the timeout for how long SASL 
negotiation takes between the {{RemoteDriver}} and HS2 (yes I know its a bit 
confusing).

{{SPARK_RPC_CLIENT_HANDSHAKE_TIMEOUT}} is set to 90 seconds by default. So HoS 
will essentially wait 90 seconds for the Spark application to be submitted. The 
app has to be submit and accepted by YARN, and the {{RemoteDriver}} has to 
startup and connect back to HS2 all within 90 seconds. Essentially, if the 
cluster is busy, HoS will wait 90 seconds for the cluster to free up enough 
resources for the Spark app the start before issuing a timeout.

Is my understanding of your problem correct?

I agree we should make the HoS behavior as close to the HoMR behavior as 
possible. I'm not entirely sure what HoMR does. Is there a timeout for the 
MapReduce application to be accepted?

> HOS times out when cluster is full while Hive-on-MR waits
> ---------------------------------------------------------
>
>                 Key: HIVE-20506
>                 URL: https://issues.apache.org/jira/browse/HIVE-20506
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Brock Noland
>            Priority: Major
>
> My understanding is as follows:
> Hive-on-MR when the cluster is full will wait for resources to be available 
> before submitting a job. This is because the hadoop jar command is the 
> primary mechanism Hive uses to know if a job is complete or failed.
>  
> Hive-on-Spark will timeout after {{SPARK_RPC_CLIENT_CONNECT_TIMEOUT}} because 
> the RPC client in the AppMaster doesn't connect back to the RPC Server in 
> HS2. 
> This is a behavior difference it'd be great to close.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to