[ https://issues.apache.org/jira/browse/HIVE-24531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258865#comment-17258865 ]
Mustafa İman commented on HIVE-24531: ------------------------------------- This happens only when doing vectorized scan over a table which was stored as TEXTFILE > Vectorized table scan ignores binary column > ------------------------------------------- > > Key: HIVE-24531 > URL: https://issues.apache.org/jira/browse/HIVE-24531 > Project: Hive > Issue Type: Bug > Reporter: Mustafa İman > Priority: Major > > There is a binary field in over1k dataset in hive codebase. Vectorized table > scan ignores binary field and passes as null in all rows. The issue affects > insert queries too with external tables and managed tables when > "hive.stats.autogather=false". > To reproduce: > Add "set hive.stats.autogather=false;" on top of "vector_data_types.q" > Run mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=vector_data_types.q" > Observe that "bin" column is all NULL when querying any of the tables. > > Below is a simplified version of the same test: > {code:java} > set hive.mapred.mode=nonstrict; > set hive.explain.user=false; > set hive.fetch.task.conversion=none; > set hive.stats.autogather=false; > DROP TABLE over1k_n8; > DROP TABLE over1korc_n1; > -- data setup > CREATE TABLE over1k_n8(t tinyint, > si smallint, > i int, > b bigint, > f float, > d double, > bo boolean, > s string, > ts timestamp, > `dec` decimal(4,2), > bin binary) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' > STORED AS TEXTFILE; > LOAD DATA LOCAL INPATH '../../data/files/over1k' OVERWRITE INTO TABLE > over1k_n8; > analyze table over1k_n8 compute statistics; > analyze table over1k_n8 compute statistics for columns; > select * from over1k_n8 limit 10; > select count(1) from over1k_n8 where bin is null; > CREATE TABLE over1korc_n1(t tinyint, > si smallint, > i int, > b bigint, > f float, > d double, > bo boolean, > s string, > ts timestamp, > `dec` decimal(4,2), > bin binary) > STORED AS ORC; > explain vectorization detail > INSERT INTO TABLE over1korc_n1 SELECT * FROM over1k_n8; > INSERT INTO TABLE over1korc_n1 SELECT * FROM over1k_n8; > select count(1) from over1korc_n1 where bin is null; > select * from over1korc_n1 limit 10; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)