[ https://issues.apache.org/jira/browse/HIVE-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038842#comment-13038842 ]
jirapos...@reviews.apache.org commented on HIVE-2156: ----------------------------------------------------- bq. On 2011-05-24 20:49:24, Ning Zhang wrote: bq. > ql/src/java/org/apache/hadoop/hive/ql/exec/JobDebugger.java, line 110 bq. > <https://reviews.apache.org/r/777/diff/2/?file=19557#file19557line110> bq. > bq. > Do you have some numbers on how long it takes to get all the TaskCompletionEvents? There are cases that a job may have more than 10k tasks and all of them failed with the same error. bq. > bq. > If it takes too long you may want to consider adding a threshold to the time spent in getting all the TaskCompleteEvents. I have only tested it on some of the queries in the NegativeCliDriver tests, where it usually only takes <10s running in miniMR cluster mode. There is a coarse timeout (default 5 minutes, configurable in HiveConf.ConfVars.JOB_DEBUG_TIMEOUT) to get all TaskCompletionEvents before we stop that is enforced by HadoopJobExecHelper, but it would make sense to timeout grabbing TaskCompletionEvents specifically, and then print out the information obtained so far instead of what this patch does, which is just throw away the taskCompletionEvents gathered so far and return the "could not obtain debugging info". Does that sound reasonable, or do you think the coarse timeout would be sufficient? bq. On 2011-05-24 20:49:24, Ning Zhang wrote: bq. > ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHelper.java, line 571 bq. > <https://reviews.apache.org/r/777/diff/2/?file=19556#file19556line571> bq. > bq. > error code -101 is also used in TaskRunner.java to indicate OOM exception. We should define all these error code in a centralized place. This was just used as something to initialize the exitVal to, that specific value should never be returned unless the call to runningJob.waitFor() returns the same value. I can change it to something else just to avoid the collision, but should we do both the consolidation of exit codes and the change to showJobDebugInfo in the same patch? They seem like different changes, and consolidating the exit codes would require touching several other parts of MapredLocalTask, MapRedTask and ExecDriver. Would these changes fit better in a separate patch? - Syed ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/777/#review711 ----------------------------------------------------------- On 2011-05-24 04:29:32, Syed Albiz wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/777/ bq. ----------------------------------------------------------- bq. bq. (Updated 2011-05-24 04:29:32) bq. bq. bq. Review request for hive and John Sichi. bq. bq. bq. Summary bq. ------- bq. bq. - Add local error messages to point to job logs and provide TaskIDs bq. - Add a timeout to the fetching of task logs and errors bq. bq. bq. This addresses bug HIVE-2156. bq. https://issues.apache.org/jira/browse/HIVE-2156 bq. bq. bq. Diffs bq. ----- bq. bq. build-common.xml 00c3680 bq. common/src/java/org/apache/hadoop/hive/conf/HiveConf.java dc96a1f bq. conf/hive-default.xml 159d825 bq. ql/build.xml 449b47a bq. ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHelper.java 4717c25 bq. ql/src/java/org/apache/hadoop/hive/ql/exec/JobDebugger.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java 53769a0 bq. ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java 691f038 bq. ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 9cb407c bq. ql/src/test/queries/clientnegative/minimr_broken_pipe.q PRE-CREATION bq. ql/src/test/results/clientnegative/dyn_part3.q.out 5f4df65 bq. ql/src/test/results/clientnegative/minimr_broken_pipe.q.out PRE-CREATION bq. ql/src/test/results/clientnegative/script_broken_pipe1.q.out d33d2cc bq. ql/src/test/results/clientnegative/script_broken_pipe2.q.out afbaa44 bq. ql/src/test/results/clientnegative/script_broken_pipe3.q.out fe8f757 bq. ql/src/test/results/clientnegative/script_error.q.out c72d780 bq. ql/src/test/results/clientnegative/udf_reflect_neg.q.out f2082a3 bq. ql/src/test/results/clientnegative/udf_test_error.q.out 5fd9a00 bq. ql/src/test/results/clientnegative/udf_test_error_reduce.q.out ddc5e5b bq. ql/src/test/templates/TestNegativeCliDriver.vm ec13f79 bq. bq. Diff: https://reviews.apache.org/r/777/diff bq. bq. bq. Testing bq. ------- bq. bq. Tested TestNegativeCliDriver in both local and miniMR mode bq. bq. bq. Thanks, bq. bq. Syed bq. bq. > Improve error messages emitted during task execution > ---------------------------------------------------- > > Key: HIVE-2156 > URL: https://issues.apache.org/jira/browse/HIVE-2156 > Project: Hive > Issue Type: Improvement > Reporter: Syed S. Albiz > Assignee: Syed S. Albiz > Attachments: HIVE-2156.1.patch, HIVE-2156.2.patch > > > Follow-up to HIVE-1731 > A number of issues were related to reporting errors from task execution and > surfacing these in a more useful form. > Currently a cryptic message with "Execution Error" and a return code and > class name of the task is emitted. > The most useful log messages here are emitted to the local logs, which can be > found through jobtracker. Having either a pointer to these logs as part of > the error message or the actual content would improve the usefulness > substantially. It may also warrant looking into how the underlying error > reporting through Hadoop is done and if more information can be propagated up > from there. > Specific issues raised in HIVE-1731: > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.MapRedTask > * issue was in regexp_extract syntax > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask > * tried: desc table_does_not_exist; -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira