GenericUDTFJSONTuple ignores IOExceptions
-----------------------------------------
Key: HIVE-2671
URL: https://issues.apache.org/jira/browse/HIVE-2671
Project: Hive
Issue Type: Bug
Components: UDF
Reporter: Dmytro Molkov
When running a query that uses GenericUDTFJSONTuple there is a chance to hit a
very nasty bug.
If the write pipeline fails the task will not detect this and will simply start
skipping all the rows in the input.
The UDTF has a catch (Throwable) that catches an IOException and forwards null
rows, which my guess is are filtered out by the filter operator down the line
so the map task never tries to write them out.
This happens for every row in the input.
as a result the query runs forever since it produces a log message for every
row (we've seen tasks run for 20 hours instead of 20 minutes)
This is a stack trace of one of the tasks just in case:
at org.apache.hadoop.io.compress.zlib.ZlibCompressor.deflateBytesDirect(Native
Method)
at
org.apache.hadoop.io.compress.zlib.ZlibCompressor.compress(ZlibCompressor.java:315)
- locked <0x000000009c174f78> (a
org.apache.hadoop.io.compress.GzipCodec$GzipZlibCompressor)
at
org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:76)
at
org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:71)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
- locked <0x000000009c18d4f8> (a java.io.BufferedOutputStream)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
- locked <0x000000009c18d4d8> (a java.io.DataOutputStream)
at org.apache.hadoop.hive.ql.io.RCFile$Writer.flushRecords(RCFile.java:894)
at org.apache.hadoop.hive.ql.io.RCFile$Writer.append(RCFile.java:875)
at
org.apache.hadoop.hive.ql.io.RCFileOutputFormat$2.write(RCFileOutputFormat.java:140)
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:592)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at
org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at
org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator.processOp(LateralViewJoinOperator.java:133)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at
org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:112)
at
org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:44)
at
org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:81)
at
org.apache.hadoop.hive.ql.udf.generic.GenericUDTFJSONTuple.process(GenericUDTFJSONTuple.java:167)
at org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at
org.apache.hadoop.hive.ql.exec.LateralViewForwardOperator.processOp(LateralViewForwardOperator.java:37)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at
org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:368)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:309)
at org.apache.hadoop.mapred.Child.main(Child.java:162)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira