[ 
https://issues.apache.org/jira/browse/FLINK-19706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lsw_aka_laplace updated FLINK-19706:
------------------------------------
    Attachment: image-2020-10-19-17-00-27-255.png

> Introduce `Repeated Partition Commit Check` in 
> `org.apache.flink.table.filesystem.PartitionCommitPolicy` 
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-19706
>                 URL: https://issues.apache.org/jira/browse/FLINK-19706
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / FileSystem, Connectors / Hive, Table SQL / 
> Runtime
>            Reporter: Lsw_aka_laplace
>            Priority: Minor
>         Attachments: image-2020-10-19-16-47-39-354.png, 
> image-2020-10-19-16-57-02-661.png, image-2020-10-19-17-00-27-255.png
>
>
> Hi all,
>       Recently we have been devoted to using Hive Streaming Writing to 
> accelerate our data-sync of Data Warehouse based on Hive, and eventually we 
> made it. 
>        For producing purpose, a lot of metrics/logs/measures were added in 
> order to help us analyze running info or fix some unexpected problems. Among 
> these mentioned above, we found that Checking Repeated Partition Commit is 
> the most important one. So here, we are willing to make a contribution of 
> introducing this backwards to Community.
>      If this proposal is meaning, I am happy to introduce my design and 
> implementation.
>  
> Looking forward to ANY opinion~
>  
>  
> ----UPDATE ----
>  
> Our user(using our own platform to build his own Flink job)raised some 
> Requests. One of the requests is that once the parition is commited, the data 
> in this partitio is regarded as frozen or completed. [Commiting partition] 
> seem like a gurantee(but we all know it is hard to be a promise) in some way 
> which tells us this partition is completed. Certainly, we make a lot of 
> measures try to achieve that [partition-commit means completed]. So if a 
> partition is committed twice or more times, for us, there must be sth wrong 
> or our measures are insufficent.  On the other hand, it also inform us to do 
> sth to make up to avoid data-loss or data-incompletion.  
>  
> So first of all, it is important to let us or help us know that certain 
> partition is committed repeatedly. So that we can do the following things ASAP
>    1. analyze the reason or the cause 
>    2. do some trade-off operations
>    3. improve our code/measures
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to