[ https://issues.apache.org/jira/browse/HIVE-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mostafa Mokhtar resolved HIVE-10446. ------------------------------------ Resolution: Fixed Issue fixed after recent changes to HybridHybrid > Hybrid Hybrid Grace Hash Join : java.lang.IllegalArgumentException in Kryo > while spilling big table > --------------------------------------------------------------------------------------------------- > > Key: HIVE-10446 > URL: https://issues.apache.org/jira/browse/HIVE-10446 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 1.2.0 > Reporter: Mostafa Mokhtar > Assignee: Wei Zheng > Fix For: 1.2.0 > > > TPC-DS Q85 fails with Kryo exception when spilling big table data. > Query > {code} > select substr(r_reason_desc,1,20) as r > ,avg(wr_return_ship_cost) wq > ,avg(wr_refunded_cash) ref > ,avg(wr_fee) fee > from web_returns, customer_demographics cd1, > customer_demographics cd2, customer_address, date_dim, reason > where > cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk > and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk > and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk > and reason.r_reason_sk = web_returns.wr_reason_sk > and cd1.cd_marital_status = cd2.cd_marital_status > and cd1.cd_education_status = cd2.cd_education_status > group by r_reason_desc > order by r, wq, ref, fee > limit 100 > {code} > Plan > {code} > OK > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Map 1 <- Map 4 (BROADCAST_EDGE), Map 5 (BROADCAST_EDGE), Map 6 > (BROADCAST_EDGE), Map 7 (BROADCAST_EDGE) > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Reducer 3 <- Reducer 2 (SIMPLE_EDGE) > DagName: mmokhtar_20150422165209_d8eb5634-c19f-4576-9525-cad248c7ca37:5 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: web_returns > filterExpr: (((wr_refunded_addr_sk is not null and > wr_reason_sk is not null) and wr_refunded_cdemo_sk is not null) and > wr_returning_cdemo_sk is not null) (type: boolean) > Statistics: Num rows: 2062802370 Data size: 185695406284 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (((wr_refunded_addr_sk is not null and > wr_reason_sk is not null) and wr_refunded_cdemo_sk is not null) and > wr_returning_cdemo_sk is not null) (type: boolean) > Statistics: Num rows: 1875154723 Data size: 51267313780 > Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: wr_refunded_cdemo_sk (type: int), > wr_refunded_addr_sk (type: int), wr_returning_cdemo_sk (type: int), > wr_reason_sk (type: int), wr_fee (type: float), wr_return_ship_cost (type: > float), wr_refunded_cash (type: float) > outputColumnNames: _col0, _col1, _col2, _col3, _col4, > _col5, _col6 > Statistics: Num rows: 1875154723 Data size: 51267313780 > Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col1 (type: int) > 1 _col0 (type: int) > outputColumnNames: _col0, _col2, _col3, _col4, _col5, > _col6 > input vertices: > 1 Map 4 > Statistics: Num rows: 1875154688 Data size: > 45003712512 Basic stats: COMPLETE Column stats: COMPLETE > HybridGraceHashJoin: true > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col3 (type: int) > 1 _col0 (type: int) > outputColumnNames: _col0, _col2, _col4, _col5, > _col6, _col9 > input vertices: > 1 Map 5 > Statistics: Num rows: 1875154688 Data size: > 219393098496 Basic stats: COMPLETE Column stats: COMPLETE > HybridGraceHashJoin: true > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col0 (type: int) > 1 _col0 (type: int) > outputColumnNames: _col2, _col4, _col5, _col6, > _col9, _col11, _col12 > input vertices: > 1 Map 6 > Statistics: Num rows: 1875154688 Data size: > 547545168896 Basic stats: COMPLETE Column stats: COMPLETE > HybridGraceHashJoin: true > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col2 (type: int), _col11 (type: string), > _col12 (type: string) > 1 _col0 (type: int), _col1 (type: string), > _col2 (type: string) > outputColumnNames: _col4, _col5, _col6, _col9 > input vertices: > 1 Map 7 > Statistics: Num rows: 402058172 Data size: > 43824340748 Basic stats: COMPLETE Column stats: COMPLETE > HybridGraceHashJoin: true > Select Operator > expressions: _col9 (type: string), _col5 > (type: float), _col6 (type: float), _col4 (type: float) > outputColumnNames: _col0, _col1, _col2, _col3 > Statistics: Num rows: 402058172 Data size: > 43824340748 Basic stats: COMPLETE Column stats: COMPLETE > Group By Operator > aggregations: avg(_col1), avg(_col2), > avg(_col3) > keys: _col0 (type: string) > mode: hash > outputColumnNames: _col0, _col1, _col2, > _col3 > Statistics: Num rows: 10975 Data size: > 1064575 Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: string) > sort order: + > Map-reduce partition columns: _col0 > (type: string) > Statistics: Num rows: 10975 Data size: > 1064575 Basic stats: COMPLETE Column stats: COMPLETE > value expressions: _col1 (type: > struct<count:bigint,sum:double,input:float>), _col2 (type: > struct<count:bigint,sum:double,input:float>), _col3 (type: > struct<count:bigint,sum:double,input:float>) > Execution mode: vectorized > Map 4 > Map Operator Tree: > TableScan > alias: customer_address > filterExpr: ca_address_sk is not null (type: boolean) > Statistics: Num rows: 40000000 Data size: 40595195284 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: ca_address_sk is not null (type: boolean) > Statistics: Num rows: 40000000 Data size: 160000000 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: ca_address_sk (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 40000000 Data size: 160000000 > Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 40000000 Data size: 160000000 > Basic stats: COMPLETE Column stats: COMPLETE > Execution mode: vectorized > Map 5 > Map Operator Tree: > TableScan > alias: reason > filterExpr: r_reason_sk is not null (type: boolean) > Statistics: Num rows: 72 Data size: 14400 Basic stats: > COMPLETE Column stats: COMPLETE > Filter Operator > predicate: r_reason_sk is not null (type: boolean) > Statistics: Num rows: 72 Data size: 7272 Basic stats: > COMPLETE Column stats: COMPLETE > Select Operator > expressions: r_reason_sk (type: int), r_reason_desc > (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 72 Data size: 7272 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 72 Data size: 7272 Basic stats: > COMPLETE Column stats: COMPLETE > value expressions: _col1 (type: string) > Execution mode: vectorized > Map 6 > Map Operator Tree: > TableScan > alias: cd1 > filterExpr: ((cd_demo_sk is not null and cd_marital_status > is not null) and cd_education_status is not null) (type: boolean) > Statistics: Num rows: 1920800 Data size: 718379200 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: ((cd_demo_sk is not null and cd_marital_status > is not null) and cd_education_status is not null) (type: boolean) > Statistics: Num rows: 1920800 Data size: 351506400 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: cd_demo_sk (type: int), cd_marital_status > (type: string), cd_education_status (type: string) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 1920800 Data size: 351506400 > Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 1920800 Data size: 351506400 > Basic stats: COMPLETE Column stats: COMPLETE > value expressions: _col1 (type: string), _col2 (type: > string) > Execution mode: vectorized > Map 7 > Map Operator Tree: > TableScan > alias: cd1 > filterExpr: ((cd_demo_sk is not null and cd_marital_status > is not null) and cd_education_status is not null) (type: boolean) > Statistics: Num rows: 1920800 Data size: 718379200 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: ((cd_demo_sk is not null and cd_marital_status > is not null) and cd_education_status is not null) (type: boolean) > Statistics: Num rows: 1920800 Data size: 351506400 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: cd_demo_sk (type: int), cd_marital_status > (type: string), cd_education_status (type: string) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 1920800 Data size: 351506400 > Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int), _col1 (type: > string), _col2 (type: string) > sort order: +++ > Map-reduce partition columns: _col0 (type: int), > _col1 (type: string), _col2 (type: string) > Statistics: Num rows: 1920800 Data size: 351506400 > Basic stats: COMPLETE Column stats: COMPLETE > Execution mode: vectorized > Reducer 2 > Reduce Operator Tree: > Group By Operator > aggregations: avg(VALUE._col0), avg(VALUE._col1), > avg(VALUE._col2) > keys: KEY._col0 (type: string) > mode: mergepartial > outputColumnNames: _col0, _col1, _col2, _col3 > Statistics: Num rows: 25 Data size: 3025 Basic stats: > COMPLETE Column stats: COMPLETE > Select Operator > expressions: substr(_col0, 1, 20) (type: string), _col1 > (type: double), _col2 (type: double), _col3 (type: double) > outputColumnNames: _col0, _col1, _col2, _col3 > Statistics: Num rows: 25 Data size: 5200 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: string), _col1 (type: > double), _col2 (type: double), _col3 (type: double) > sort order: ++++ > Statistics: Num rows: 25 Data size: 5200 Basic stats: > COMPLETE Column stats: COMPLETE > TopN Hash Memory Usage: 0.04 > Reducer 3 > Reduce Operator Tree: > Select Operator > expressions: KEY.reducesinkkey0 (type: string), > KEY.reducesinkkey1 (type: double), KEY.reducesinkkey2 (type: double), > KEY.reducesinkkey3 (type: double) > outputColumnNames: _col0, _col1, _col2, _col3 > Statistics: Num rows: 25 Data size: 5200 Basic stats: > COMPLETE Column stats: COMPLETE > Limit > Number of rows: 100 > Statistics: Num rows: 25 Data size: 5200 Basic stats: > COMPLETE Column stats: COMPLETE > File Output Operator > compressed: false > Statistics: Num rows: 25 Data size: 5200 Basic stats: > COMPLETE Column stats: COMPLETE > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-0 > Fetch Operator > limit: 100 > Processor Tree: > ListSink > {code} > Exception > {code} > ], TaskAttempt 3 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148) > ... 14 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:52) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83) > ... 17 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected > exception: output cannot be null. > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:411) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.process(VectorMapJoinOperator.java:287) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:138) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) > at > org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:114) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45) > ... 18 more > Caused by: java.lang.IllegalArgumentException: output cannot be null. > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:601) > at > org.apache.hadoop.hive.ql.exec.persistence.ObjectContainer.add(ObjectContainer.java:101) > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.spillBigTableRow(MapJoinOperator.java:425) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.spillBigTableRow(VectorMapJoinOperator.java:307) > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:390) > ... 27 more > ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex > vertex_1426707664723_3652_3_04 [Map 1] killed/failed due to:null]Vertex > killed, vertexName=Reducer 3, vertexId=vertex_1426707664723_3652_3_06, > diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as > other vertex failed. failedTasks:0, Vertex vertex_1426707664723_3652_3_06 > [Reducer 3] killed/failed due to:null]Vertex killed, vertexName=Reducer 2, > vertexId=vertex_1426707664 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)