GenericUDTFJSONTuple ignores IOExceptions
-----------------------------------------

                 Key: HIVE-2671
                 URL: https://issues.apache.org/jira/browse/HIVE-2671
             Project: Hive
          Issue Type: Bug
          Components: UDF
            Reporter: Dmytro Molkov


When running a query that uses GenericUDTFJSONTuple there is a chance to hit a 
very nasty bug.
If the write pipeline fails the task will not detect this and will simply start 
skipping all the rows in the input.

The UDTF has a catch (Throwable) that catches an IOException and forwards null 
rows, which my guess is are filtered out by the filter operator down the line 
so the map task never tries to write them out.

This happens for every row in the input.
as a result the query runs forever since it produces a log message for every 
row (we've seen tasks run for 20 hours instead of 20 minutes)

This is a stack trace of one of the tasks just in case:
at org.apache.hadoop.io.compress.zlib.ZlibCompressor.deflateBytesDirect(Native 
Method)
at 
org.apache.hadoop.io.compress.zlib.ZlibCompressor.compress(ZlibCompressor.java:315)
- locked <0x000000009c174f78> (a 
org.apache.hadoop.io.compress.GzipCodec$GzipZlibCompressor)
at 
org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:76)
at 
org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:71)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
- locked <0x000000009c18d4f8> (a java.io.BufferedOutputStream)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
- locked <0x000000009c18d4d8> (a java.io.DataOutputStream)
at org.apache.hadoop.hive.ql.io.RCFile$Writer.flushRecords(RCFile.java:894)
at org.apache.hadoop.hive.ql.io.RCFile$Writer.append(RCFile.java:875)
at 
org.apache.hadoop.hive.ql.io.RCFileOutputFormat$2.write(RCFileOutputFormat.java:140)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:592)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at 
org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at 
org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator.processOp(LateralViewJoinOperator.java:133)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at 
org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:112)
at 
org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:44)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:81)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDTFJSONTuple.process(GenericUDTFJSONTuple.java:167)
at org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at 
org.apache.hadoop.hive.ql.exec.LateralViewForwardOperator.processOp(LateralViewForwardOperator.java:37)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at 
org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:368)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:309)
at org.apache.hadoop.mapred.Child.main(Child.java:162)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to