Jean-Pierre Hoang created HIVE-21936:
----------------------------------------

             Summary: Snapshot inconsistency plan execution
                 Key: HIVE-21936
                 URL: https://issues.apache.org/jira/browse/HIVE-21936
             Project: Hive
          Issue Type: Bug
          Components: HBase Handler
    Affects Versions: 2.3.5, 2.3.4, 3.1.1, 3.1.0, 2.3.2, 2.3.1, 3.0.0, 2.3.0, 
2.2.0, 2.1.1, 2.1.0, 2.0.1, 1.2.2, 2.0.0, 1.2.1, 1.1.1
            Reporter: Jean-Pierre Hoang


when using snapshot from hive, there are no validation of the existence of the 
snapshot nor if the snapshot apply to the hive target table.

How to reproduce :

create two hive table backing from hbase:
 
{code:java}
CREATE TABLE default.employee(rowkey string, name string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ( "hbase.columns.mapping"= "cf:string", 
"hbase.table.name"= "default:employee" );

CREATE TABLE default.work(rowkey string, company string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ( "hbase.columns.mapping"= "cf:string", 
"hbase.table.name"= "default:work" );  {code}
 
{{Insert some stuff in the tables:}}
 
{code:java}
INSERT INTO TABLE default.employee values("1", "Dupont");
INSERT INTO TABLE default.work values ("c1", "ACME");{code}
 
 
{{from Hbase, create a snapshot :}}
{code:java}
snapshot 'employee', 'mysnapshot'{code}
 
{{from beeline some sanity check}}
{code:java}
SELECT * FROM employee;
SELECT * FROM work;
{code}
{{Now that the set up is done, the first bug appearing is when setting the 
snapshot name within hive and query another hbase table:}}
 
{code:java}
set hive.hbase.snapshot.name=mysnapshot;
SELECT * FROM work;{code}
{{The problem is the condition that trigger snapshot input format :}}
{code:java}
  @Override
  public Class<? extends InputFormat> getInputFormatClass() {
    if (HiveConf.getVar(jobConf, HiveConf.ConfVars.HIVE_HBASE_SNAPSHOT_NAME) != 
null) {
      LOG.debug("Using TableSnapshotInputFormat");
      return HiveHBaseTableSnapshotInputFormat.class;
    }
    LOG.debug("Using HiveHBaseTableInputFormat");
    return HiveHBaseTableInputFormat.class;
  }{code}
{{}}
 
{{The second problem is the pushdown predicate when using the snapshot in a 
query more complex than a simple select :}}
{code:java}
set hive.hbase.snapshot.name=mysnapshot;
SELECT * FROM employee a UNION ALL SELECT * FROM employee b;{code}
{{the result is not what we expect : all the column that is not rowkey is 
null.}}
 
{{As a result, we can really use the snapshot feature for use case that need 
analytic computation (full scan).}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to