Carl created FLINK-23730:
----------------------------

             Summary: Source from hive sink hbase lost data
                 Key: FLINK-23730
                 URL: https://issues.apache.org/jira/browse/FLINK-23730
             Project: Flink
          Issue Type: Bug
          Components: Connectors / HBase, Connectors / Hive
    Affects Versions: 1.12.1
            Reporter: Carl


Our use case is as follows,
 # hive source: create hive table which meta data is in HMS
 # create hbase use hbase shell
 # flink sql ddl: create hbase flink table
 # use hive catalog: use flink sql insert into hbase flink table

if i set the tableconfig:  table.exec.hive.infer-source-parallelism = false

The program will run as one parallelism,and the number of records of results is 
correct.

but if i set the tableconfig:  table.exec.hive.infer-source-parallelism = true

The program will run as twenty parallelism that express source parallelism is 
inferred according to splits number,and the number of records of results is not 
correct.

 

The test was repeated many times and there was no exception occurred.

 

So I guess it has something to do with high concurrency. Does it lose data 
because of high concurrency?

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to