[ https://issues.apache.org/jira/browse/HIVE-16368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashutosh Chauhan updated HIVE-16368: ------------------------------------ Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Pushed to master. Thanks, Zhihai! > Unexpected java.lang.ArrayIndexOutOfBoundsException from query with LaterView > Operation for hive on MR. > ------------------------------------------------------------------------------------------------------- > > Key: HIVE-16368 > URL: https://issues.apache.org/jira/browse/HIVE-16368 > Project: Hive > Issue Type: Bug > Components: Query Planning > Reporter: zhihai xu > Assignee: zhihai xu > Fix For: 3.0.0 > > Attachments: HIVE-16368.000.patch, HIVE-16368.001.patch, > HIVE-16368.002.patch > > > Unexpected java.lang.ArrayIndexOutOfBoundsException from query. It happened > in LaterView Operation. It happened for hive-on-mr. The reason is because the > column prune change the column order in LaterView operation, for back-back > reducesink operators using MR engine, FileSinkOperator and TableScanOperator > are added before the second ReduceSink operator, The serialization column > order used by FileSinkOperator in LazyBinarySerDe of previous reducer is > different from deserialization column order from table desc used by > MapOperator/TableScanOperator in LazyBinarySerDe of current failed mapper. > The serialization is decided by the outputObjInspector from > LateralViewJoinOperator, > {code} > ArrayList<String> fieldNames = conf.getOutputInternalColNames(); > outputObjInspector = ObjectInspectorFactory > .getStandardStructObjectInspector(fieldNames, ois); > {code} > So the column order for serialization is decided by getOutputInternalColNames > in LateralViewJoinOperator. > The deserialization is decided by TableScanOperator which is created at > GenMapRedUtils.splitTasks. > {code} > TableDesc tt_desc = PlanUtils.getIntermediateFileTableDesc(PlanUtils > .getFieldSchemasFromRowSchema(parent.getSchema(), "temporarycol")); > // Create the temporary file, its corresponding FileSinkOperaotr, and > // its corresponding TableScanOperator. > TableScanOperator tableScanOp = > createTemporaryFile(parent, op, taskTmpDir, tt_desc, parseCtx); > {code} > The column order for deserialization is decided by rowSchema of > LateralViewJoinOperator. > But ColumnPrunerLateralViewJoinProc changed the order of > outputInternalColNames but still keep the original order of rowSchema, > Which cause the mismatch between serialization and deserialization for two > back-to-back MR jobs. > Similar issue for ColumnPrunerLateralViewForwardProc which change the column > order of its child selector colList but not rowSchema. > The exception is > {code} > Caused by: java.lang.ArrayIndexOutOfBoundsException: 875968094 > at > org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.byteArrayToLong(LazyBinaryUtils.java:78) > at > org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryDouble.init(LazyBinaryDouble.java:43) > at > org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:264) > at > org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:201) > at > org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64) > at > org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:554) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:381) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)