[ https://issues.apache.org/jira/browse/HIVE-16368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954570#comment-15954570 ]
zhihai xu commented on HIVE-16368: ---------------------------------- The query plan of Reduce Operator Tree for the MR job with LateralViewJoinOperator is {code} | Stage: Stage-3 | | Map Reduce | | Map Operator Tree: | ...... | Reduce Operator Tree: | | Join Operator | | condition map: | | Inner Join 0 to 1 | | keys: | | 0 _col7 (type: string) | | 1 msg.chain_uuid (type: string) | | outputColumnNames: _col0, _col3, _col4, _col5, _col7, _col8, _col9, _col10, _col11, _col12, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col26, _col27, _col28, _col31, _col32, _col33, _col34, _col35, _col36, _col44 | | Statistics: Num rows: 35439950 Data size: 1169518365 Basic stats: COMPLETE Column stats: NONE | | Select Operator | | expressions: _col0 (type: string), _col3 (type: string), _col4 (type: bigint), _col5 (type: bigint), _col7 (type: string), _col8 (type: string), _col9 (type: int), _col10 (type: int), _col11 (type: string), _col12 (type: string), _col14 (type: double), _col15 (type: double), _col16 (type: double), _col17 (type: double), _col18 (type: double), _col19 (type: double), _col20 (type: double), _col21 (type: double), _col22 (type: double), _col26 (type: timestamp), _col27 (type: string), _col28 (type: array<string>), _col31 (type: double), _col32 (type: double), _col33 (type: double), _col34 (type: double), _col35 (type: string), _col36 (type: bigint), _col44.all_points (type: array<struct<ts:bigint,duration:bigint,lat:double,lng:double,on_uuids:array<string>,speed:double,assigned_uuids:array<string>>>) | | outputColumnNames: _col0, _col3, _col4, _col5, _col7, _col8, _col9, _col10, _col11, _col12, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col26, _col27, _col28, _col31, _col32, _col33, _col34, _col35, _col36, _col37 | | Statistics: Num rows: 35439950 Data size: 1169518365 Basic stats: COMPLETE Column stats: NONE | | Lateral View Forward | | Statistics: Num rows: 35439950 Data size: 1169518365 Basic stats: COMPLETE Column stats: NONE | | Select Operator | | expressions: _col0 (type: string), _col10 (type: int), _col11 (type: string), _col12 (type: string), _col14 (type: double), _col15 (type: double), _col16 (type: double), _col17 (type: double), _col18 (type: double), _col19 (type: double), _col20 (type: double), _col21 (type: double), _col22 (type: double), _col26 (type: timestamp), _col27 (type: string), _col28 (type: array<string>), _col3 (type: string), _col31 (type: double), _col32 (type: double), _col33 (type: double), _col34 (type: double), _col35 (type: string), _col36 (type: bigint), _col4 (type: bigint), _col5 (type: bigint), _col7 (type: string), _col8 (type: string), _col9 (type: int) | | outputColumnNames: _col0, _col10, _col11, _col12, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col26, _col27, _col28, _col3, _col31, _col32, _col33, _col34, _col35, _col36, _col4, _col5, _col7, _col8, _col9 | | Statistics: Num rows: 35439950 Data size: 1169518365 Basic stats: COMPLETE Column stats: NONE | | Lateral View Join Operator | | outputColumnNames: _col0, _col10, _col11, _col12, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col26, _col27, _col28, _col3, _col31, _col32, _col33, _col34, _col35, _col36, _col4, _col5, _col7, _col8, _col9, _col38 | | Statistics: Num rows: 70879900 Data size: 2339036730 Basic stats: COMPLETE Column stats: NONE | | File Output Operator | | compressed: false | | table: | | input format: org.apache.hadoop.mapred.SequenceFileInputFormat | | output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | | serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe | | Select Operator | | expressions: _col37 (type: array<struct<ts:bigint,duration:bigint,lat:double,lng:double,on_uuids:array<string>,speed:double,assigned_uuids:array<string>>>) | | outputColumnNames: _col0 | | Statistics: Num rows: 35439950 Data size: 1169518365 Basic stats: COMPLETE Column stats: NONE | | UDTF Operator | | Statistics: Num rows: 35439950 Data size: 1169518365 Basic stats: COMPLETE Column stats: NONE | | function name: explode | | Lateral View Join Operator | | outputColumnNames: _col0, _col10, _col11, _col12, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col26, _col27, _col28, _col3, _col31, _col32, _col33, _col34, _col35, _col36, _col4, _col5, _col7, _col8, _col9, _col38 | | Statistics: Num rows: 70879900 Data size: 2339036730 Basic stats: COMPLETE Column stats: NONE | | File Output Operator | | compressed: false | | table: | | input format: org.apache.hadoop.mapred.SequenceFileInputFormat | | output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | | serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe {code} The query plan of Map Operator Tree for the MR job with the following TableScanOperator is: {code} | Stage: Stage-4 | | Map Reduce | | Map Operator Tree: | | TableScan | | Reduce Output Operator | | key expressions: _col27 (type: string), _col7 (type: string), _col38.ts (type: bigint) | | sort order: +++ | | Map-reduce partition columns: _col27 (type: string), _col7 (type: string) | | Statistics: Num rows: 70879900 Data size: 2339036730 Basic stats: COMPLETE Column stats: NONE | | value expressions: _col0 (type: string), _col3 (type: string), _col4 (type: bigint), _col5 (type: bigint), _col8 (type: string), _col9 (type: int), _col10 (type: int), _col11 (type: string), _col12 (type: string), _col14 (type: double), _col15 (type: double), _col16 (type: double), _col17 (type: double), _col18 (type: double), _col19 (type: double), _col20 (type: double), _col21 (type: double), _col22 (type: double), _col26 (type: timestamp), _col28 (type: array<string>), _col31 (type: double), _col32 (type: double), _col33 (type: double), _col34 (type: double), _col35 (type: string), _col36 (type: bigint), _col38 (type: struct<ts:bigint,duration:bigint,lat:double,lng:double,on_uuids:array<string>,speed:double,assigned_uuids:array<string>>) | | Reduce Operator Tree: {code} > Unexpected java.lang.ArrayIndexOutOfBoundsException from query with LaterView > Operation for hive on MR. > ------------------------------------------------------------------------------------------------------- > > Key: HIVE-16368 > URL: https://issues.apache.org/jira/browse/HIVE-16368 > Project: Hive > Issue Type: Bug > Components: Query Planning > Reporter: zhihai xu > Assignee: zhihai xu > > Unexpected java.lang.ArrayIndexOutOfBoundsException from query. It happened > in LaterView Operation. It happened for hive-on-mr. The reason is because the > column prune change the column order in LaterView operation, for back-back > reducesink operators using MR engine, FileSinkOperator and TableScanOperator > are added before the second ReduceSink operator, The serialization column > order used by FileSinkOperator in LazyBinarySerDe of previous reducer is > different from deserialization column order from table desc used by > MapOperator/TableScanOperator in LazyBinarySerDe of current failed mapper. > The serialization is decided by the outputObjInspector from > LateralViewJoinOperator, > {code} > ArrayList<String> fieldNames = conf.getOutputInternalColNames(); > outputObjInspector = ObjectInspectorFactory > .getStandardStructObjectInspector(fieldNames, ois); > {code} > So the column order for serialization is decided by getOutputInternalColNames > in LateralViewJoinOperator. > The deserialization is decided by TableScanOperator which is created at > GenMapRedUtils.splitTasks. > {code} > TableDesc tt_desc = PlanUtils.getIntermediateFileTableDesc(PlanUtils > .getFieldSchemasFromRowSchema(parent.getSchema(), "temporarycol")); > // Create the temporary file, its corresponding FileSinkOperaotr, and > // its corresponding TableScanOperator. > TableScanOperator tableScanOp = > createTemporaryFile(parent, op, taskTmpDir, tt_desc, parseCtx); > {code} > The column order for deserialization is decided by rowSchema of > LateralViewJoinOperator. > But ColumnPrunerLateralViewJoinProc changed the order of > outputInternalColNames but still keep the original order of rowSchema, > Which cause the mismatch between serialization and deserialization for two > back-to-back MR jobs. > Similar issue for ColumnPrunerLateralViewForwardProc which change the column > order of its child selector colList but not rowSchema. > The exception is > {code} > Caused by: java.lang.ArrayIndexOutOfBoundsException: 875968094 > at > org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.byteArrayToLong(LazyBinaryUtils.java:78) > at > org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryDouble.init(LazyBinaryDouble.java:43) > at > org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:264) > at > org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:201) > at > org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64) > at > org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:554) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:381) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)