[ https://issues.apache.org/jira/browse/HIVE-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755256#comment-13755256 ]
Sushanth Sowmyan commented on HIVE-4969: ---------------------------------------- Hi, could you please attach a testcase that tests this as well? That way, the tests(including your test) fails without your fix, and succeeds with your fix. Also, as a general note, the HBaseHCatStorageHandler is about to be deprecated in favour of the hive's HBaseStorageHandler with HIVE-4331. > HCatalog HBaseHCatStorageHandler is not returning all the data > -------------------------------------------------------------- > > Key: HIVE-4969 > URL: https://issues.apache.org/jira/browse/HIVE-4969 > Project: Hive > Issue Type: Bug > Components: HCatalog > Affects Versions: 0.11.0 > Reporter: Venki Korukanti > Priority: Critical > Fix For: 0.11.1, 0.12.0 > > Attachments: HIVE-4969-1.patch > > > Repro steps: > 1) Create an HCatalog table mapped to HBase table. > hcat -e "CREATE TABLE studentHCat(rownum int, name string, age int, gpa float) > STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler' > TBLPROPERTIES('hbase.table.name' ='studentHBase', > 'hbase.columns.mapping' = > ':key,onecf:name,twocf:age,threecf:gpa')"; > 2) Load the following data from Pig. > cat student_data > 1^Asarah laertes^A23^A2.40 > 2^Atom allen^A72^A1.57 > 3^Abob ovid^A61^A2.67 > 4^Aethan nixon^A38^A2.15 > 5^Acalvin robinson^A28^A2.53 > 6^Airene ovid^A65^A2.56 > 7^Ayuri garcia^A36^A1.65 > 8^Acalvin nixon^A41^A1.04 > 9^Ajessica davidson^A48^A2.11 > 10^Akatie king^A39^A1.05 > grunt> A = LOAD 'student_data' AS > (rownum:int,name:chararray,age:int,gpa:float); > grunt> STORE A INTO 'studentHCat' USING org.apache.hcatalog.pig.HCatStorer(); > 3) Now from HBase do a scan on the studentHBase table > hbase(main):026:0> scan 'studentPig', {LIMIT => 5} > 4) From pig access the data in table > grunt> A = LOAD 'studentHCat' USING org.apache.hcatalog.pig.HCatLoader(); > grunt> STORE A INTO '/user/root/studentPig'; > 5) Verify the output written in StudentPig > hadoop fs -cat /user/root/studentPig/part-r-00000 > 1 23 > 2 72 > 3 61 > 4 38 > 5 28 > 6 65 > 7 36 > 8 41 > 9 48 > 10 39 > The data returned has only two fields (rownum and age). > Problem: > While reading the data from HBase table, HbaseSnapshotRecordReader gets data > row in Result (org.apache.hadoop.hbase.client.Result) object and processes > the KeyValue fields in it. After processing, it creates another Result object > out of the processed KeyValue array. Problem here is KeyValue array is not > sorted. Result object expects the input KeyValue array to have sorted > elements. When we call the Result.getValue() it returns no value for some of > the fields as it does a binary search on un-ordered array. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira