[
https://issues.apache.org/jira/browse/HIVE-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137784#comment-14137784
]
Na Yang commented on HIVE-8162:
-------------------------------
The operator tree for this query is like:
TS0-FIL9-SEL2-GBY4-RS5-GBY6-SEL7-RS10-EX11-FS8.
The task graph for this query is like:
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-2 depends on stages: Stage-1
Stage-0 depends on stages: Stage-2
Stage-3 depends on stages: Stage-0
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: associateddata
Statistics: Num rows: 25374 Data size: 101496 Basic stats: COMPLETE
Column stats: NONE
Filter Operator
predicate: (sm_campaign_id) IN (10187171, 1090942, 10541943,
10833443, 8635630, 10187170, 9445296, 10696334, 11398585, 9524211, 1145211)
(type: boolean)
Statistics: Num rows: 12687 Data size: 50748 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: map('x_product_id':'') (type: map<string,string>),
day_id (type: int)
outputColumnNames: _col0, _col1
Statistics: Num rows: 12687 Data size: 50748 Basic stats:
COMPLETE Column stats: NONE
Group By Operator
keys: _col0 (type: map<string,string>), _col1 (type: int)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 12687 Data size: 50748 Basic stats:
COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: map<string,string>), _col1
(type: int)
sort order: ++
Map-reduce partition columns: _col0 (type:
map<string,string>), _col1 (type: int)
Statistics: Num rows: 12687 Data size: 50748 Basic stats:
COMPLETE Column stats: NONE
Reduce Operator Tree:
Group By Operator
keys: KEY._col0 (type: map<string,string>), KEY._col1 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 6343 Data size: 25372 Basic stats: COMPLETE
Column stats: NONE
Select Operator
expressions: 2 (type: int), _col0 (type: map<string,string>), _col1
(type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 6343 Data size: 25372 Basic stats: COMPLETE
Column stats: NONE
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Stage: Stage-2
Map Reduce
Map Operator Tree:
TableScan
Reduce Output Operator
key expressions: _col2 (type: int), _col0 (type:
map<string,string>), _col1 (type: int)
sort order: +++
Map-reduce partition columns: _col2 (type: int)
Statistics: Num rows: 6343 Data size: 25372 Basic stats: COMPLETE
Column stats: NONE
value expressions: _col0 (type: int), _col1 (type:
map<string,string>), _col2 (type: int)
Reduce Operator Tree:
Extract
Statistics: Num rows: 6343 Data size: 25372 Basic stats: COMPLETE
Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 6343 Data size: 25372 Basic stats: COMPLETE
Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: default.agg_pv_associateddata_c
Stage: Stage-0
Move Operator
tables:
partition:
day_id
replace: false
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: default.agg_pv_associateddata_c
Stage: Stage-3
Stats-Aggr Operator
The exception happens when executing task stage-2. The ReduceSinkDesc for RS10
has keycols type as {int, map<string,string>, int} and the intermediate file
for this table is stored in SequenceFileInputFormat and using LazyBinarySerDe.
However, the LazyBinarySerDe is not able to deserialize non-primitive type from
the intermediate file which causes the exception.
Using the TextInputFormat and LazySimpleSerDe for the intermediate file, the
exception is gone. However, changing the intermediate file InputFormat and
SerDe is not a preferred solution.
> hive.optimize.sort.dynamic.partition causes RuntimeException for inserting
> into dynamic partitioned table when map function is used in the subquery
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-8162
> URL: https://issues.apache.org/jira/browse/HIVE-8162
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.13.0
> Reporter: Na Yang
> Attachments: 47rows.txt
>
>
> Exception:
> Diagnostic Messages for this Task:
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:
> Hive Runtime Error: Unable to deserialize reduce input key from
> x1x129x51x83x14x1x128x0x0x2x1x1x1x120x95x112x114x111x100x117x99x116x95x105x100x0x1x0x0x255
> with properties {columns=reducesinkkey0,reducesinkkey1,reducesinkkey2,
> serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
> serialization.sort.order=+++, columns.types=int,map<string,string>,int}
> at
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
> at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:518)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:462)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:282)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1122)
> at org.apache.hadoop.mapred.Child.main(Child.java:271)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
> Error: Unable to deserialize reduce input key from
> x1x129x51x83x14x1x128x0x0x2x1x1x1x120x95x112x114x111x100x117x99x116x95x105x100x0x1x0x0x255
> with properties {columns=reducesinkkey0,reducesinkkey1,reducesinkkey2,
> serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
> serialization.sort.order=+++, columns.types=int,map<string,string>,int}
> at
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:222)
> ... 7 more
> Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.EOFException
> at
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:189)
> at
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:220)
> ... 7 more
> Caused by: java.io.EOFException
> at
> org.apache.hadoop.hive.serde2.binarysortable.InputByteBuffer.read(InputByteBuffer.java:54)
> at
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserializeInt(BinarySortableSerDe.java:533)
> at
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:236)
> at
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:185)
> ... 8 more
> Step to reproduce the exception:
> -------------------------------------------------
> CREATE TABLE associateddata(creative_id int,creative_group_id int,placement_id
> int,sm_campaign_id int,browser_id string, trans_type_p string,trans_time_p
> string,group_name string,event_name string,order_id string,revenue
> float,currency string, trans_type_ci string,trans_time_ci string,f16
> map<string,string>,campaign_id int,user_agent_cat string,geo_country
> string,geo_city string,geo_state string,geo_zip string,geo_dma string,geo_area
> string,geo_isp string,site_id int,section_id int,f16_ci map<string,string>)
> PARTITIONED BY(day_id int, hour_id int) ROW FORMAT DELIMITED FIELDS TERMINATED
> BY '\t';
> LOAD DATA LOCAL INPATH '/tmp/47rows.txt' INTO TABLE associateddata
> PARTITION(day_id=20140814,hour_id=2014081417);
> set hive.exec.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> CREATE EXTERNAL TABLE IF NOT EXISTS agg_pv_associateddata_c (
> vt_tran_qty int COMMENT 'The count of view
> thru transactions'
> , pair_value_txt string COMMENT 'F16 name values
> pairs'
> )
> PARTITIONED BY (day_id int)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
> STORED AS TEXTFILE
> LOCATION '/user/prodman/agg_pv_associateddata_c';
> INSERT INTO TABLE agg_pv_associateddata_c PARTITION (day_id)
> select 2 as vt_tran_qty, pair_value_txt, day_id
> from (select map( 'x_product_id',coalesce(F16['x_product_id'],'') ) as
> pair_value_txt , day_id , hour_id
> from associateddata where hour_id = 2014081417 and sm_campaign_id in
> (10187171,1090942,10541943,10833443,8635630,10187170,9445296,10696334,11398585,9524211,1145211)
> ) a GROUP BY pair_value_txt, day_id;
> The query worked fine in Hive-0.12 and Hive-0.13. It starts failing in
> Hive-0.13. If hive.optimize.sort.dynamic.partition is turned off in
> Hive-0.13, the exception is gone.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)