[jira] [Comment Edited] (HIVE-13872) Vectorization: Fix cross-product reduce sink serialization

Gopal V (JIRA) Thu, 26 May 2016 21:55:52 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303480#comment-15303480
 ]


Gopal V edited comment on HIVE-13872 at 5/27/16 4:54 AM:
---------------------------------------------------------

AFAIK, the issue is that the column pruner removes the nearly all columns from 
the TableScan, but the VectorizationContext does not realize the needed columns 
list because there's no SEL operator in the middle to indicate the project of 
the 2 columns.

{code}
2016-05-27T00:52:21,575 INFO  
[IO-Elevator-Thread-22[attempt_1462788318414_0308_24_00_000002_3]]: LlapIoImpl 
(:()) - Processing data for 
hdfs://cn108-10.l42scl.hortonworks.com:8020/apps/hive/warehouse/tpcds_bin_partitioned_orc_200.db/customer_demographics/000003_0
2016-05-27T00:52:21,613 WARN  
[TezTaskRunner[attempt_1462788318414_0308_24_00_000001_3]]: 
vector.VectorReduceSinkOperator (:()) - Object inspectors = 
org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector<org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector@395562d2,org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector@7cedb0fe,org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector@7cedb0fe,org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector@7cedb0fe,org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector@395562d2,org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector@7cedb0fe,org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector@395562d2,org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector@395562d2,org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector@395562d2>
2016-05-27T00:52:21,613 WARN  
[TezTaskRunner[attempt_1462788318414_0308_24_00_000001_3]]: 
vector.VectorReduceSinkOperator (:()) - Projected columns = 0, 1, 2, 3, 4, 5, 
6, 7, 8, 
2016-05-27T00:52:21,614 ERROR 
[TezTaskRunner[attempt_1462788318414_0308_24_00_000001_3]]: tez.MapRecordSource 
(:()) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
{code}


was (Author: gopalv):
AFAIK, the issue is that the column pruner removes the nearly all columns from 
the TableScan, but the VectorizationContext does not realize the needed columns 
list because there's no SEL operator in the middle to indicate the project of 
the 3 columns.

{code}
2016-05-27T00:52:21,575 INFO  
[IO-Elevator-Thread-22[attempt_1462788318414_0308_24_00_000002_3]]: LlapIoImpl 
(:()) - Processing data for 
hdfs://cn108-10.l42scl.hortonworks.com:8020/apps/hive/warehouse/tpcds_bin_partitioned_orc_200.db/customer_demographics/000003_0
2016-05-27T00:52:21,613 WARN  
[TezTaskRunner[attempt_1462788318414_0308_24_00_000001_3]]: 
vector.VectorReduceSinkOperator (:()) - Object inspectors = 
org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector<org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector@395562d2,org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector@7cedb0fe,org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector@7cedb0fe,org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector@7cedb0fe,org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector@395562d2,org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector@7cedb0fe,org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector@395562d2,org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector@395562d2,org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector@395562d2>
2016-05-27T00:52:21,613 WARN  
[TezTaskRunner[attempt_1462788318414_0308_24_00_000001_3]]: 
vector.VectorReduceSinkOperator (:()) - Projected columns = 0, 1, 2, 3, 4, 5, 
6, 7, 8, 
2016-05-27T00:52:21,614 ERROR 
[TezTaskRunner[attempt_1462788318414_0308_24_00_000001_3]]: tez.MapRecordSource 
(:()) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
{code}

> Vectorization: Fix cross-product reduce sink serialization
> ----------------------------------------------------------
>
>                 Key: HIVE-13872
>                 URL: https://issues.apache.org/jira/browse/HIVE-13872
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>    Affects Versions: 2.1.0
>            Reporter: Gopal V
>
> TPC-DS Q13 produces a cross-product without CBO simplifying the query
> {code}
> Caused by: java.lang.RuntimeException: null STRING entry: batchIndex 0 
> projection column num 1
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.nullBytesReadError(VectorExtractRow.java:349)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRowColumn(VectorExtractRow.java:267)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRow(VectorExtractRow.java:343)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.process(VectorReduceSinkOperator.java:103)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
>         at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:762)
>         ... 18 more
> {code}
> Simplified query
> {code}
> set hive.cbo.enable=false;
> -- explain
> select count(1)  
>  from store_sales
>      ,customer_demographics
>  where (
> ( 
>   customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
>   and customer_demographics.cd_marital_status = 'M'
>      )or
>      (
>    customer_demographics.cd_demo_sk = ss_cdemo_sk
>   and customer_demographics.cd_marital_status = 'U'
>      ))
> ;
> {code}
> {code}
>         Map 3 
>             Map Operator Tree:
>                 TableScan
>                   alias: customer_demographics
>                   Statistics: Num rows: 1920800 Data size: 717255532 Basic 
> stats: COMPLETE Column stats: NONE
>                   Reduce Output Operator
>                     sort order: 
>                     Statistics: Num rows: 1920800 Data size: 717255532 Basic 
> stats: COMPLETE Column stats: NONE
>                     value expressions: cd_demo_sk (type: int), 
> cd_marital_status (type: string)
>             Execution mode: vectorized, llap
>             LLAP IO: all inputs
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-13872) Vectorization: Fix cross-product reduce sink serialization

Reply via email to