[ 
https://issues.apache.org/jira/browse/HIVE-24531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258865#comment-17258865
 ] 

Mustafa İman commented on HIVE-24531:
-------------------------------------

This happens only when doing vectorized scan over a table which was stored as 
TEXTFILE

> Vectorized table scan ignores binary column
> -------------------------------------------
>
>                 Key: HIVE-24531
>                 URL: https://issues.apache.org/jira/browse/HIVE-24531
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Mustafa İman
>            Priority: Major
>
> There is a binary field in over1k dataset in hive codebase. Vectorized table 
> scan ignores binary field and passes as null in all rows. The issue affects 
> insert queries too with external tables and managed tables when 
> "hive.stats.autogather=false". 
> To reproduce:
> Add "set hive.stats.autogather=false;" on top of "vector_data_types.q"
> Run mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=vector_data_types.q"
> Observe that "bin" column is all NULL when querying any of the tables.
>  
> Below is a simplified version of the same test:
> {code:java}
> set hive.mapred.mode=nonstrict;
> set hive.explain.user=false;
> set hive.fetch.task.conversion=none;
> set hive.stats.autogather=false;
> DROP TABLE over1k_n8;
> DROP TABLE over1korc_n1;
> -- data setup
> CREATE TABLE over1k_n8(t tinyint,
>            si smallint,
>            i int,
>            b bigint,
>            f float,
>            d double,
>            bo boolean,
>            s string,
>            ts timestamp,
>            `dec` decimal(4,2),
>            bin binary)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
> STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '../../data/files/over1k' OVERWRITE INTO TABLE 
> over1k_n8;
> analyze table over1k_n8 compute statistics;
> analyze table over1k_n8 compute statistics for columns;
> select * from over1k_n8 limit 10;
> select count(1) from over1k_n8 where bin is null;
> CREATE TABLE over1korc_n1(t tinyint,
>            si smallint,
>            i int,
>            b bigint,
>            f float,
>            d double,
>            bo boolean,
>            s string,
>            ts timestamp,
>            `dec` decimal(4,2),
>            bin binary)
> STORED AS ORC;
> explain vectorization detail
> INSERT INTO TABLE over1korc_n1 SELECT * FROM over1k_n8;
> INSERT INTO TABLE over1korc_n1 SELECT * FROM over1k_n8;
> select count(1) from over1korc_n1 where bin is null;
> select * from over1korc_n1 limit 10;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to