[ 
https://issues.apache.org/jira/browse/FLINK-25330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461651#comment-17461651
 ] 

Jing Ge commented on FLINK-25330:
---------------------------------

Hi Bruce,

multi versions support is one of core design feature of HBase. From the HBase 
Delete API, we can see the default deletion behaviour is to delete the last 
version.
{code:java}
addColumn(final byte [] family, final byte [] qualifier) {code}
Speaking of Flink case, HBase is not only used for CDC, it has been used in 
many different big data processing scenarios with Flink, like user behaviour 
analytics, churn analytics, could actually be used in every phase of the AARRR 
module, including trigger persona based promotion operation, where historical 
versions of the users' tracking data will be consumed. Because the data is so 
important, physical deletions are generally converted to logical deletions. We 
could think it from a different direction. If all versions should be always 
deleted for any delete request, why should HBase design the multi versions and 
provide the API in the first place? It will consume more resource and provide 
no extra value.

Back to your scenario, since you want to delete all versions, it looks like you 
only need one version for each column, therefore the simplest solution could be 
let the column only store 1 version. WDYT?

> Flink SQL doesn't retract all versions of Hbase data
> ----------------------------------------------------
>
>                 Key: FLINK-25330
>                 URL: https://issues.apache.org/jira/browse/FLINK-25330
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / HBase
>    Affects Versions: 1.14.0
>            Reporter: Bruce Wong
>            Assignee: Jing Ge
>            Priority: Critical
>              Labels: pull-request-available
>         Attachments: image-2021-12-15-20-05-18-236.png
>
>
> h2. Background
> When we use CDC to synchronize mysql data to HBase, we find that HBase 
> deletes only the last version of the specified rowkey when deleting mysql 
> data. The data of the old version still exists. You end up using the wrong 
> data. And I think its a bug of HBase connector.
> The following figure shows Hbase data changes before and after mysql data is 
> deleted.
> !image-2021-12-15-20-05-18-236.png|width=910,height=669!
>  
> h2.  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to