[ https://issues.apache.org/jira/browse/FLINK-19706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lsw_aka_laplace updated FLINK-19706: ------------------------------------ Attachment: image-2020-10-19-17-00-27-255.png > Introduce `Repeated Partition Commit Check` in > `org.apache.flink.table.filesystem.PartitionCommitPolicy` > --------------------------------------------------------------------------------------------------------- > > Key: FLINK-19706 > URL: https://issues.apache.org/jira/browse/FLINK-19706 > Project: Flink > Issue Type: Improvement > Components: Connectors / FileSystem, Connectors / Hive, Table SQL / > Runtime > Reporter: Lsw_aka_laplace > Priority: Minor > Attachments: image-2020-10-19-16-47-39-354.png, > image-2020-10-19-16-57-02-661.png, image-2020-10-19-17-00-27-255.png > > > Hi all, > Recently we have been devoted to using Hive Streaming Writing to > accelerate our data-sync of Data Warehouse based on Hive, and eventually we > made it. > For producing purpose, a lot of metrics/logs/measures were added in > order to help us analyze running info or fix some unexpected problems. Among > these mentioned above, we found that Checking Repeated Partition Commit is > the most important one. So here, we are willing to make a contribution of > introducing this backwards to Community. > If this proposal is meaning, I am happy to introduce my design and > implementation. > > Looking forward to ANY opinion~ > > > ----UPDATE ---- > > Our user(using our own platform to build his own Flink job)raised some > Requests. One of the requests is that once the parition is commited, the data > in this partitio is regarded as frozen or completed. [Commiting partition] > seem like a gurantee(but we all know it is hard to be a promise) in some way > which tells us this partition is completed. Certainly, we make a lot of > measures try to achieve that [partition-commit means completed]. So if a > partition is committed twice or more times, for us, there must be sth wrong > or our measures are insufficent. On the other hand, it also inform us to do > sth to make up to avoid data-loss or data-incompletion. > > So first of all, it is important to let us or help us know that certain > partition is committed repeatedly. So that we can do the following things ASAP > 1. analyze the reason or the cause > 2. do some trade-off operations > 3. improve our code/measures > > -- This message was sent by Atlassian Jira (v8.3.4#803005)