[jira] [Commented] (HIVE-20506) HOS times out when cluster is full while Hive-on-MR waits

Brock Noland (JIRA) Wed, 05 Sep 2018 16:36:05 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-20506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16605054#comment-16605054
 ]


Brock Noland commented on HIVE-20506:
-------------------------------------

[~stakiar] - perfect yes you understand correctly. Hive on MR will just wait 
forever for the job to be submitted. The reason is that Hive on MR just does a 
{{hadoop}} command execution and waits for that to return to decide if the job 
failed or succeed. One MR job equates to one stage. HOS however starts a Spark 
Application per user session and so one Spark application can run N stages or 
even N queries.

Thus to fix this, I think we need to make HOS wait for the Spark App to 
actually start before the handshake timeout starts counting down.

> HOS times out when cluster is full while Hive-on-MR waits
> ---------------------------------------------------------
>
>                 Key: HIVE-20506
>                 URL: https://issues.apache.org/jira/browse/HIVE-20506
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Brock Noland
>            Priority: Major
>
> My understanding is as follows:
> Hive-on-MR when the cluster is full will wait for resources to be available 
> before submitting a job. This is because the hadoop jar command is the 
> primary mechanism Hive uses to know if a job is complete or failed.
>  
> Hive-on-Spark will timeout after {{SPARK_RPC_CLIENT_CONNECT_TIMEOUT}} because 
> the RPC client in the AppMaster doesn't connect back to the RPC Server in 
> HS2. 
> This is a behavior difference it'd be great to close.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20506) HOS times out when cluster is full while Hive-on-MR waits

Reply via email to