[ https://issues.apache.org/jira/browse/HIVE-9078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243665#comment-14243665 ]
Chengxiang Li commented on HIVE-9078: ------------------------------------- Yes, Xuefu, the retry logic in SparkJobMonitor is inherent from TezJobMonitor which SparkJobMonitor clone from at first, now look back to it, it seems does not make much sense for spark, if a exception is thrown during get spark job status, we should expose it and try to find the root cause and fix it, no retry it. I've tested the failed qfiles locally, it seems all join related qfile fails on its own reasons, while vector_cast_constant.q success in my local test. let's wait for the second automatic test result. > Hive should not submit second SparkTask while previous one has failed.[Spark > Branch] > ------------------------------------------------------------------------------------ > > Key: HIVE-9078 > URL: https://issues.apache.org/jira/browse/HIVE-9078 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Chengxiang Li > Assignee: Chengxiang Li > Labels: Spark-M4 > Attachments: HIVE-9078.1-spark.patch, HIVE-9078.1-spark.patch, > HIVE-9078.2-spark.patch > > > {noformat} > hive> select n_name, c_name from nation, customer where nation.n_nationkey = > customer.c_nationkey limit 10; > Query ID = root_20141211135050_51e5ae15-49a3-4a46-826f-e27ee314ccb2 > Total jobs = 2 > Launching Job 1 out of 2 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer=<number> > In order to limit the maximum number of reducers: > set hive.exec.reducers.max=<number> > In order to set a constant number of reducers: > set mapreduce.job.reduces=<number> > Status: Failed > Launching Job 2 out of 2 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer=<number> > In order to limit the maximum number of reducers: > set hive.exec.reducers.max=<number> > In order to set a constant number of reducers: > set mapreduce.job.reduces=<number> > Status: Failed > OK > Time taken: 68.53 seconds > {noformat} > 2 issue in the above CLI output. > # For a query which would be translated into multi SparkTask, is previous > SparkTask failed, Hive should failed right away, the following SparkTask > should not be submitted any more. > # Print failed info in Hive console while query failed. > The correct CLI output while query failed: > {noformat} > hive> select n_name, c_name from nation, customer where nation.n_nationkey = > customer.c_nationkey limit 10; > Query ID = root_20141211142929_ddb7f205-8422-44b4-96bd-96a1c9291895 > Total jobs = 2 > Launching Job 1 out of 2 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer=<number> > In order to limit the maximum number of reducers: > set hive.exec.reducers.max=<number> > In order to set a constant number of reducers: > set mapreduce.job.reduces=<number> > Status: Failed > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.spark.SparkTask > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)