[ https://issues.apache.org/jira/browse/HIVE-20506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16605054#comment-16605054 ]
Brock Noland commented on HIVE-20506: ------------------------------------- [~stakiar] - perfect yes you understand correctly. Hive on MR will just wait forever for the job to be submitted. The reason is that Hive on MR just does a {{hadoop}} command execution and waits for that to return to decide if the job failed or succeed. One MR job equates to one stage. HOS however starts a Spark Application per user session and so one Spark application can run N stages or even N queries. Thus to fix this, I think we need to make HOS wait for the Spark App to actually start before the handshake timeout starts counting down. > HOS times out when cluster is full while Hive-on-MR waits > --------------------------------------------------------- > > Key: HIVE-20506 > URL: https://issues.apache.org/jira/browse/HIVE-20506 > Project: Hive > Issue Type: Improvement > Reporter: Brock Noland > Priority: Major > > My understanding is as follows: > Hive-on-MR when the cluster is full will wait for resources to be available > before submitting a job. This is because the hadoop jar command is the > primary mechanism Hive uses to know if a job is complete or failed. > > Hive-on-Spark will timeout after {{SPARK_RPC_CLIENT_CONNECT_TIMEOUT}} because > the RPC client in the AppMaster doesn't connect back to the RPC Server in > HS2. > This is a behavior difference it'd be great to close. -- This message was sent by Atlassian JIRA (v7.6.3#76005)