[DISCUSS] HBase backed table gives different results for * and key based queries

Denys Kuzmenko Mon, 03 Sep 2018 02:09:54 -0700

Hi All,

I would like to ask for your advice with below use case:


While working with Hive we came across the situation when Hive produces
inconsistent results (row counts are different)

*Reproduce steps:*

1) create schema and populate with data;

CREATE TABLE `a`(
  `key` string COMMENT 'from deserializer',
  `a` bigint COMMENT 'from deserializer',
  `b` bigint COMMENT 'from deserializer')
ROW FORMAT SERDE
  'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY
  'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
  'hbase.columns.mapping'=':key, CF1:A, CF2:B',
  'serialization.format'='1')
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='false',
  'hbase.table.default.storage.type'='binary',
  'hbase.table.name'='A');

CREATE EXTERNAL TABLE `b`(
  `key` string COMMENT 'from deserializer',
  `b` bigint COMMENT 'from deserializer')
ROW FORMAT SERDE
  'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY
  'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
  'hbase.columns.mapping'=':key, CF2:B',
  'serialization.format'='1')
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='false',
  'hbase.table.default.storage.type'='binary',
  'hbase.table.name'='A');

INSERT INTO A VALUES (1,null,1),(2,2,null),(3,3,3);


2) issue the following queries:

0: jdbc:hive2://localhost:10000> SELECT key from b;[...]
+------+--+
| key  |
+------+--+
| 1    |
| 2    |
| 3    |
+------+--+
3 rows selected (0.337 seconds)
0: jdbc:hive2://localhost:10000> SELECT * from b;[..]
+--------+------+--+
| b.key  | b.b  |
+--------+------+--+
| 1      | 1    |
| 3      | 3    |
+--------+------+--+
2 rows selected (0.401 seconds)


Rows with *NULL* values will be ignored by HBASE:



In case of `*select*` by `*rowkey*`, *HiveHBaseInputFormatUtil.java*
constructs following SCAN object, that doesn't consider column families:



When extending SCAN object with column families results become consistent:



*Questions:*

1. Do you think it's a real issue or just an invalid use case?
2. If it is and issue, would it be enough to extend SCAN object with column
families?

Thank you!

Best Regards,
Denys

[DISCUSS] HBase backed table gives different results for * and key based queries

Reply via email to