[ 
https://issues.apache.org/jira/browse/HIVE-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038842#comment-13038842
 ] 

jirapos...@reviews.apache.org commented on HIVE-2156:
-----------------------------------------------------



bq.  On 2011-05-24 20:49:24, Ning Zhang wrote:
bq.  > ql/src/java/org/apache/hadoop/hive/ql/exec/JobDebugger.java, line 110
bq.  > <https://reviews.apache.org/r/777/diff/2/?file=19557#file19557line110>
bq.  >
bq.  >     Do you have some numbers on how long it takes to get all the 
TaskCompletionEvents? There are cases that a job may have more than 10k tasks 
and all of them failed with the same error.
bq.  >     
bq.  >     If it takes too long you may want to consider adding a threshold to 
the time spent in getting all the TaskCompleteEvents.

I have only tested it on some of the queries in the NegativeCliDriver tests, 
where it usually only takes <10s running in miniMR cluster mode. There is a 
coarse timeout (default 5 minutes, configurable in 
HiveConf.ConfVars.JOB_DEBUG_TIMEOUT) to get all TaskCompletionEvents before we 
stop that is enforced by HadoopJobExecHelper, but it would make sense to 
timeout grabbing TaskCompletionEvents specifically, and then print out the 
information obtained so far instead of what this patch does, which is just 
throw away the taskCompletionEvents gathered so far and return the "could not 
obtain debugging info". Does that sound reasonable, or do you think the coarse 
timeout would be sufficient?


bq.  On 2011-05-24 20:49:24, Ning Zhang wrote:
bq.  > ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHelper.java, 
line 571
bq.  > <https://reviews.apache.org/r/777/diff/2/?file=19556#file19556line571>
bq.  >
bq.  >     error code -101 is also used in TaskRunner.java to indicate OOM 
exception. We should define all these error code in a centralized place.

This was just used as something to initialize the exitVal to, that specific 
value should never be returned unless the call to runningJob.waitFor() returns 
the same value. I can change it to something else just to avoid the collision, 
but should we do both the consolidation of exit codes and the change to 
showJobDebugInfo in the same patch? They seem like different changes, and 
consolidating the exit codes would require touching several other parts of 
MapredLocalTask, MapRedTask and ExecDriver. Would these changes fit better in a 
separate patch?


- Syed


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/777/#review711
-----------------------------------------------------------


On 2011-05-24 04:29:32, Syed Albiz wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/777/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-05-24 04:29:32)
bq.  
bq.  
bq.  Review request for hive and John Sichi.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  - Add local error messages to point to job logs and provide TaskIDs
bq.  - Add a timeout to the fetching of task logs and errors
bq.  
bq.  
bq.  This addresses bug HIVE-2156.
bq.      https://issues.apache.org/jira/browse/HIVE-2156
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    build-common.xml 00c3680 
bq.    common/src/java/org/apache/hadoop/hive/conf/HiveConf.java dc96a1f 
bq.    conf/hive-default.xml 159d825 
bq.    ql/build.xml 449b47a 
bq.    ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHelper.java 
4717c25 
bq.    ql/src/java/org/apache/hadoop/hive/ql/exec/JobDebugger.java PRE-CREATION 
bq.    ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java 53769a0 
bq.    ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java 691f038 
bq.    ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
9cb407c 
bq.    ql/src/test/queries/clientnegative/minimr_broken_pipe.q PRE-CREATION 
bq.    ql/src/test/results/clientnegative/dyn_part3.q.out 5f4df65 
bq.    ql/src/test/results/clientnegative/minimr_broken_pipe.q.out PRE-CREATION 
bq.    ql/src/test/results/clientnegative/script_broken_pipe1.q.out d33d2cc 
bq.    ql/src/test/results/clientnegative/script_broken_pipe2.q.out afbaa44 
bq.    ql/src/test/results/clientnegative/script_broken_pipe3.q.out fe8f757 
bq.    ql/src/test/results/clientnegative/script_error.q.out c72d780 
bq.    ql/src/test/results/clientnegative/udf_reflect_neg.q.out f2082a3 
bq.    ql/src/test/results/clientnegative/udf_test_error.q.out 5fd9a00 
bq.    ql/src/test/results/clientnegative/udf_test_error_reduce.q.out ddc5e5b 
bq.    ql/src/test/templates/TestNegativeCliDriver.vm ec13f79 
bq.  
bq.  Diff: https://reviews.apache.org/r/777/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Tested TestNegativeCliDriver in both local and miniMR mode
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Syed
bq.  
bq.



> Improve error messages emitted during task execution
> ----------------------------------------------------
>
>                 Key: HIVE-2156
>                 URL: https://issues.apache.org/jira/browse/HIVE-2156
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Syed S. Albiz
>            Assignee: Syed S. Albiz
>         Attachments: HIVE-2156.1.patch, HIVE-2156.2.patch
>
>
> Follow-up to HIVE-1731
> A number of issues were related to reporting errors from task execution and 
> surfacing these in a more useful form.
> Currently a cryptic message with "Execution Error" and a return code and 
> class name of the task is emitted.
> The most useful log messages here are emitted to the local logs, which can be 
> found through jobtracker. Having either a pointer to these logs as part of 
> the error message or the actual content would improve the usefulness 
> substantially. It may also warrant looking into how the underlying error 
> reporting through Hadoop is done and if more information can be propagated up 
> from there.
> Specific issues raised in  HIVE-1731:
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.MapRedTask
> * issue was in regexp_extract syntax
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask
> * tried: desc table_does_not_exist;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to