Jean-Pierre Hoang created HIVE-21936: ----------------------------------------
Summary: Snapshot inconsistency plan execution Key: HIVE-21936 URL: https://issues.apache.org/jira/browse/HIVE-21936 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 2.3.5, 2.3.4, 3.1.1, 3.1.0, 2.3.2, 2.3.1, 3.0.0, 2.3.0, 2.2.0, 2.1.1, 2.1.0, 2.0.1, 1.2.2, 2.0.0, 1.2.1, 1.1.1 Reporter: Jean-Pierre Hoang when using snapshot from hive, there are no validation of the existence of the snapshot nor if the snapshot apply to the hive target table. How to reproduce : create two hive table backing from hbase: {code:java} CREATE TABLE default.employee(rowkey string, name string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping"= "cf:string", "hbase.table.name"= "default:employee" ); CREATE TABLE default.work(rowkey string, company string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping"= "cf:string", "hbase.table.name"= "default:work" ); {code} {{Insert some stuff in the tables:}} {code:java} INSERT INTO TABLE default.employee values("1", "Dupont"); INSERT INTO TABLE default.work values ("c1", "ACME");{code} {{from Hbase, create a snapshot :}} {code:java} snapshot 'employee', 'mysnapshot'{code} {{from beeline some sanity check}} {code:java} SELECT * FROM employee; SELECT * FROM work; {code} {{Now that the set up is done, the first bug appearing is when setting the snapshot name within hive and query another hbase table:}} {code:java} set hive.hbase.snapshot.name=mysnapshot; SELECT * FROM work;{code} {{The problem is the condition that trigger snapshot input format :}} {code:java} @Override public Class<? extends InputFormat> getInputFormatClass() { if (HiveConf.getVar(jobConf, HiveConf.ConfVars.HIVE_HBASE_SNAPSHOT_NAME) != null) { LOG.debug("Using TableSnapshotInputFormat"); return HiveHBaseTableSnapshotInputFormat.class; } LOG.debug("Using HiveHBaseTableInputFormat"); return HiveHBaseTableInputFormat.class; }{code} {{}} {{The second problem is the pushdown predicate when using the snapshot in a query more complex than a simple select :}} {code:java} set hive.hbase.snapshot.name=mysnapshot; SELECT * FROM employee a UNION ALL SELECT * FROM employee b;{code} {{the result is not what we expect : all the column that is not rowkey is null.}} {{As a result, we can really use the snapshot feature for use case that need analytic computation (full scan).}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)