[ https://issues.apache.org/jira/browse/HIVE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573987#comment-13573987 ]
Brock Noland commented on HIVE-3179: ------------------------------------ I have verified this is an issue with trunk, the patch applies, and the patch addresses the issue. {noformat} hive> select c1, c3, c2, c2, c3, c3, c1 from hive_hbase_test; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201302071609_0002, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201302071609_0002 Kill Command = /opt/local/hadoop-1.1.1/libexec/../bin/hadoop job -kill job_201302071609_0002 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2013-02-07 16:10:31,826 Stage-1 map = 0%, reduce = 0% 2013-02-07 16:10:34,846 Stage-1 map = 100%, reduce = 0% 2013-02-07 16:10:36,861 Stage-1 map = 100%, reduce = 100% Ended Job = job_201302071609_0002 MapReduce Jobs Launched: Job 0: Map: 1 HDFS Read: 260 HDFS Write: 60 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK c1-1 c3-1 c2-1 c2-1 c3-1 c3-1 c1-1 c1-2 NULL NULL NULL NULL NULL c1-2 Time taken: 10.702 seconds, Fetched: 2 row(s) hive> {noformat} > HBase Handler doesn't handle NULLs properly > ------------------------------------------- > > Key: HIVE-3179 > URL: https://issues.apache.org/jira/browse/HIVE-3179 > Project: Hive > Issue Type: Bug > Components: HBase Handler > Affects Versions: 0.9.0 > Reporter: Lars Francke > Priority: Critical > Attachments: HIVE-3179.1.patch > > > We found a quite severe issue in the HBase Handler which actually means that > Hive potentially returns incorrect data if a column has NULL values in HBase > (which means the cell doesn't even exist) > In HBase Shell: > {noformat} > create 'hive_hbase_test', 'test' > put 'hive_hbase_test', '1', 'test:c1', 'c1-1' > put 'hive_hbase_test', '1', 'test:c2', 'c2-1' > put 'hive_hbase_test', '1', 'test:c3', 'c3-1' > put 'hive_hbase_test', '2', 'test:c1', 'c1-2' > {noformat} > In Hive: > {noformat} > DROP TABLE IF EXISTS hive_hbase_test; > CREATE EXTERNAL TABLE hive_hbase_test ( > id int, > c1 string, > c2 string, > c3 string > ) > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES ("hbase.columns.mapping" = > ":key#s,test:c1#s,test:c2#s,test:c3#s") > TBLPROPERTIES("hbase.table.name" = "hive_hbase_test"); > hive> select * from hive_hbase_test; > OK > 1 c1-1 c2-1 c3-1 > 2 c1-2 NULL NULL > hive> select c1 from hive_hbase_test; > c1-1 > c1-2 > hive> select c1, c2 from hive_hbase_test; > c1-1 c2-1 > c1-2 NULL > {noformat} > So far everything is correct but now: > {noformat} > hive> select c1, c2, c2 from hive_hbase_test; > c1-1 c2-1 c2-1 > c1-2 NULL c2-1 > {noformat} > Selecting c2 twice works the first time but the second time we > actually get the value from the previous row. > {noformat} > hive> select c1, c3, c2, c2, c3, c3, c1 from hive_hbase_test; > c1-1 c3-1 c2-1 c2-1 c3-1 c3-1 c1-1 > c1-2 NULL NULL c2-1 c3-1 c3-1 c1-2 > {noformat} > We've narrowed this down to an early initialization of > {{fieldsInited\[fieldID] = true}} in {{LazyHBaseRow#uncheckedGetField}} and > we'll try to provide a patch which surely needs review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira