> On 2011-05-24 20:49:24, Ning Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/JobDebugger.java, line 110
> > <https://reviews.apache.org/r/777/diff/2/?file=19557#file19557line110>
> >
> >     Do you have some numbers on how long it takes to get all the 
> > TaskCompletionEvents? There are cases that a job may have more than 10k 
> > tasks and all of them failed with the same error.
> >     
> >     If it takes too long you may want to consider adding a threshold to the 
> > time spent in getting all the TaskCompleteEvents.

I have only tested it on some of the queries in the NegativeCliDriver tests, 
where it usually only takes <10s running in miniMR cluster mode. There is a 
coarse timeout (default 5 minutes, configurable in 
HiveConf.ConfVars.JOB_DEBUG_TIMEOUT) to get all TaskCompletionEvents before we 
stop that is enforced by HadoopJobExecHelper, but it would make sense to 
timeout grabbing TaskCompletionEvents specifically, and then print out the 
information obtained so far instead of what this patch does, which is just 
throw away the taskCompletionEvents gathered so far and return the "could not 
obtain debugging info". Does that sound reasonable, or do you think the coarse 
timeout would be sufficient?


> On 2011-05-24 20:49:24, Ning Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHelper.java, line 
> > 571
> > <https://reviews.apache.org/r/777/diff/2/?file=19556#file19556line571>
> >
> >     error code -101 is also used in TaskRunner.java to indicate OOM 
> > exception. We should define all these error code in a centralized place.

This was just used as something to initialize the exitVal to, that specific 
value should never be returned unless the call to runningJob.waitFor() returns 
the same value. I can change it to something else just to avoid the collision, 
but should we do both the consolidation of exit codes and the change to 
showJobDebugInfo in the same patch? They seem like different changes, and 
consolidating the exit codes would require touching several other parts of 
MapredLocalTask, MapRedTask and ExecDriver. Would these changes fit better in a 
separate patch?


- Syed


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/777/#review711
-----------------------------------------------------------


On 2011-05-24 04:29:32, Syed Albiz wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/777/
> -----------------------------------------------------------
> 
> (Updated 2011-05-24 04:29:32)
> 
> 
> Review request for hive and John Sichi.
> 
> 
> Summary
> -------
> 
> - Add local error messages to point to job logs and provide TaskIDs
> - Add a timeout to the fetching of task logs and errors
> 
> 
> This addresses bug HIVE-2156.
>     https://issues.apache.org/jira/browse/HIVE-2156
> 
> 
> Diffs
> -----
> 
>   build-common.xml 00c3680 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java dc96a1f 
>   conf/hive-default.xml 159d825 
>   ql/build.xml 449b47a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHelper.java 4717c25 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/JobDebugger.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java 53769a0 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java 691f038 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 9cb407c 
>   ql/src/test/queries/clientnegative/minimr_broken_pipe.q PRE-CREATION 
>   ql/src/test/results/clientnegative/dyn_part3.q.out 5f4df65 
>   ql/src/test/results/clientnegative/minimr_broken_pipe.q.out PRE-CREATION 
>   ql/src/test/results/clientnegative/script_broken_pipe1.q.out d33d2cc 
>   ql/src/test/results/clientnegative/script_broken_pipe2.q.out afbaa44 
>   ql/src/test/results/clientnegative/script_broken_pipe3.q.out fe8f757 
>   ql/src/test/results/clientnegative/script_error.q.out c72d780 
>   ql/src/test/results/clientnegative/udf_reflect_neg.q.out f2082a3 
>   ql/src/test/results/clientnegative/udf_test_error.q.out 5fd9a00 
>   ql/src/test/results/clientnegative/udf_test_error_reduce.q.out ddc5e5b 
>   ql/src/test/templates/TestNegativeCliDriver.vm ec13f79 
> 
> Diff: https://reviews.apache.org/r/777/diff
> 
> 
> Testing
> -------
> 
> Tested TestNegativeCliDriver in both local and miniMR mode
> 
> 
> Thanks,
> 
> Syed
> 
>

Reply via email to