[ 
https://issues.apache.org/jira/browse/FLINK-23730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl updated FLINK-23730:
-------------------------
    Attachment: image-2021-08-26-09-44-20-390.png

> Source from hive sink hbase lost data
> -------------------------------------
>
>                 Key: FLINK-23730
>                 URL: https://issues.apache.org/jira/browse/FLINK-23730
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / HBase, Connectors / Hive
>    Affects Versions: 1.12.1
>            Reporter: Carl
>            Priority: Major
>         Attachments: image-2021-08-26-09-43-39-055.png, 
> image-2021-08-26-09-44-20-390.png
>
>
> Our use case is as follows,
>  # hive source: create hive table which meta data is in HMS
>  # create hbase use hbase shell
>  # flink sql ddl: create hbase flink table
>  # use hive catalog: use flink sql insert into hbase flink table
> if i set the tableconfig:  table.exec.hive.infer-source-parallelism = false
> The program will run as one parallelism,and the number of records of results 
> is correct.
> but if i set the tableconfig:  table.exec.hive.infer-source-parallelism = true
> The program will run as twenty parallelism that express source parallelism is 
> inferred according to splits number,and the number of records of results is 
> not correct.
>  
> The test was repeated many times and there was no exception occurred.
>  
> So I guess it has something to do with high concurrency. Does it lose data 
> because of high concurrency?
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to