[jira] [Commented] (HIVE-8162) hive.optimize.sort.dynamic.partition causes RuntimeException for inserting into dynamic partitioned table when map function is used in the subquery

Na Yang (JIRA) Wed, 17 Sep 2014 12:25:56 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137784#comment-14137784
 ]


Na Yang commented on HIVE-8162:
-------------------------------

The operator tree for this query is like:
TS0-FIL9-SEL2-GBY4-RS5-GBY6-SEL7-RS10-EX11-FS8.

The task graph for this query is like:
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-2 depends on stages: Stage-1
  Stage-0 depends on stages: Stage-2
  Stage-3 depends on stages: Stage-0

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: associateddata
            Statistics: Num rows: 25374 Data size: 101496 Basic stats: COMPLETE 
Column stats: NONE
            Filter Operator
              predicate: (sm_campaign_id) IN (10187171, 1090942, 10541943, 
10833443, 8635630, 10187170, 9445296, 10696334, 11398585, 9524211, 1145211) 
(type: boolean)
              Statistics: Num rows: 12687 Data size: 50748 Basic stats: 
COMPLETE Column stats: NONE
              Select Operator
                expressions: map('x_product_id':'') (type: map<string,string>), 
day_id (type: int)
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 12687 Data size: 50748 Basic stats: 
COMPLETE Column stats: NONE
                Group By Operator
                  keys: _col0 (type: map<string,string>), _col1 (type: int)
                  mode: hash
                  outputColumnNames: _col0, _col1
                  Statistics: Num rows: 12687 Data size: 50748 Basic stats: 
COMPLETE Column stats: NONE
                  Reduce Output Operator
                    key expressions: _col0 (type: map<string,string>), _col1 
(type: int)
                    sort order: ++
                    Map-reduce partition columns: _col0 (type: 
map<string,string>), _col1 (type: int)
                    Statistics: Num rows: 12687 Data size: 50748 Basic stats: 
COMPLETE Column stats: NONE
      Reduce Operator Tree:
        Group By Operator
          keys: KEY._col0 (type: map<string,string>), KEY._col1 (type: int)
          mode: mergepartial
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 6343 Data size: 25372 Basic stats: COMPLETE 
Column stats: NONE
          Select Operator
            expressions: 2 (type: int), _col0 (type: map<string,string>), _col1 
(type: int)
            outputColumnNames: _col0, _col1, _col2
            Statistics: Num rows: 6343 Data size: 25372 Basic stats: COMPLETE 
Column stats: NONE
            File Output Operator
              compressed: false
              table:
                  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                  serde: 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-2
    Map Reduce
      Map Operator Tree:
          TableScan
            Reduce Output Operator
              key expressions: _col2 (type: int), _col0 (type: 
map<string,string>), _col1 (type: int)
              sort order: +++
              Map-reduce partition columns: _col2 (type: int)
              Statistics: Num rows: 6343 Data size: 25372 Basic stats: COMPLETE 
Column stats: NONE
              value expressions: _col0 (type: int), _col1 (type: 
map<string,string>), _col2 (type: int)
      Reduce Operator Tree:
        Extract
          Statistics: Num rows: 6343 Data size: 25372 Basic stats: COMPLETE 
Column stats: NONE
          File Output Operator
            compressed: false
            Statistics: Num rows: 6343 Data size: 25372 Basic stats: COMPLETE 
Column stats: NONE
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                name: default.agg_pv_associateddata_c

  Stage: Stage-0
    Move Operator
      tables:
          partition:
            day_id 
          replace: false
          table:
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              name: default.agg_pv_associateddata_c

  Stage: Stage-3
    Stats-Aggr Operator

The exception happens when executing task stage-2. The ReduceSinkDesc for RS10 
has keycols type as {int, map<string,string>, int} and the intermediate file 
for this table is stored in SequenceFileInputFormat and using LazyBinarySerDe. 
However, the LazyBinarySerDe is not able to deserialize non-primitive type from 
the intermediate file which causes the exception.   

Using the TextInputFormat and LazySimpleSerDe for the intermediate file, the 
exception is gone. However, changing the intermediate file InputFormat and 
SerDe is not a preferred solution.  

> hive.optimize.sort.dynamic.partition causes RuntimeException for inserting 
> into dynamic partitioned table when map function is used in the subquery 
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-8162
>                 URL: https://issues.apache.org/jira/browse/HIVE-8162
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.13.0
>            Reporter: Na Yang
>         Attachments: 47rows.txt
>
>
> Exception:
> Diagnostic Messages for this Task:
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error: Unable to deserialize reduce input key from 
> x1x129x51x83x14x1x128x0x0x2x1x1x1x120x95x112x114x111x100x117x99x116x95x105x100x0x1x0x0x255
>  with properties {columns=reducesinkkey0,reducesinkkey1,reducesinkkey2, 
> serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
>  serialization.sort.order=+++, columns.types=int,map<string,string>,int}
>       at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
>       at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:518)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:462)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:282)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1122)
>       at org.apache.hadoop.mapred.Child.main(Child.java:271)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error: Unable to deserialize reduce input key from 
> x1x129x51x83x14x1x128x0x0x2x1x1x1x120x95x112x114x111x100x117x99x116x95x105x100x0x1x0x0x255
>  with properties {columns=reducesinkkey0,reducesinkkey1,reducesinkkey2, 
> serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
>  serialization.sort.order=+++, columns.types=int,map<string,string>,int}
>       at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:222)
>       ... 7 more
> Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.EOFException
>       at 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:189)
>       at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:220)
>       ... 7 more
> Caused by: java.io.EOFException
>       at 
> org.apache.hadoop.hive.serde2.binarysortable.InputByteBuffer.read(InputByteBuffer.java:54)
>       at 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserializeInt(BinarySortableSerDe.java:533)
>       at 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:236)
>       at 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:185)
>       ... 8 more
> Step to reproduce the exception:
> -------------------------------------------------
> CREATE TABLE associateddata(creative_id int,creative_group_id int,placement_id
> int,sm_campaign_id int,browser_id string, trans_type_p string,trans_time_p
> string,group_name string,event_name string,order_id string,revenue
> float,currency string, trans_type_ci string,trans_time_ci string,f16
> map<string,string>,campaign_id int,user_agent_cat string,geo_country
> string,geo_city string,geo_state string,geo_zip string,geo_dma string,geo_area
> string,geo_isp string,site_id int,section_id int,f16_ci map<string,string>)
> PARTITIONED BY(day_id int, hour_id int) ROW FORMAT DELIMITED FIELDS TERMINATED
> BY '\t';
> LOAD DATA LOCAL INPATH '/tmp/47rows.txt' INTO TABLE associateddata
> PARTITION(day_id=20140814,hour_id=2014081417);
> set hive.exec.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict; 
> CREATE  EXTERNAL TABLE IF NOT EXISTS agg_pv_associateddata_c (
>  vt_tran_qty             int                     COMMENT 'The count of view
> thru transactions'
> , pair_value_txt          string                  COMMENT 'F16 name values
> pairs'
> )
> PARTITIONED BY (day_id int)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
> STORED AS TEXTFILE
> LOCATION '/user/prodman/agg_pv_associateddata_c';
> INSERT INTO TABLE agg_pv_associateddata_c PARTITION (day_id)
> select 2 as vt_tran_qty, pair_value_txt, day_id
>  from (select map( 'x_product_id',coalesce(F16['x_product_id'],'') ) as 
> pair_value_txt , day_id , hour_id 
> from associateddata where hour_id = 2014081417 and sm_campaign_id in
> (10187171,1090942,10541943,10833443,8635630,10187170,9445296,10696334,11398585,9524211,1145211)
> ) a GROUP BY pair_value_txt, day_id;
> The query worked fine in Hive-0.12 and Hive-0.13. It starts failing in 
> Hive-0.13. If hive.optimize.sort.dynamic.partition is turned off in 
> Hive-0.13, the exception is gone. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8162) hive.optimize.sort.dynamic.partition causes RuntimeException for inserting into dynamic partitioned table when map function is used in the subquery

Reply via email to