[ 
https://issues.apache.org/jira/browse/FLINK-22472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17350817#comment-17350817
 ] 

luoyuxia commented on FLINK-22472:
----------------------------------

I think this problem can be caused by two reasons:

1:  Although the partition is comittable  according to the partition commit 
policy you configure, there still remains data needed to be written to this 
patition. In this case,  you may need to check your partition commit policy. 

2: Currently, the StreamingFileWriter can't aware of it that the partition is 
comittable and then commits all files in this partition. So although the 
partition has been commited, the files in this parition haven't been commited. 
In this case, we can modify the logic of StreamingFileWriter to alleviate the 
problem.

> The real partition data produced time is behind meta(_SUCCESS) file produced
> ----------------------------------------------------------------------------
>
>                 Key: FLINK-22472
>                 URL: https://issues.apache.org/jira/browse/FLINK-22472
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / FileSystem, Connectors / Hive
>            Reporter: Leonard Xu
>            Priority: Major
>         Attachments: image-2021-05-25-14-27-40-563.png
>
>
> I test write some data to csv file by flink filesystem connector, but after 
> the success file produced, the data file is still un-committed, it's very 
> weird to me.
> {code:java}
> bang@mac db1.db $ll 
> /var/folders/55/cw682b314gn8jhfh565hp7q00000gp/T/junit8642959834366044048/junit484868942580135598/test-partition-time-commit/d\=2020-05-03/e\=12/
> total 8
> drwxr-xr-x  4 bang  staff  128  4 25 19:57 ./
> drwxr-xr-x  8 bang  staff  256  4 25 19:57 ../
> -rw-r--r--  1 bang  staff   12  4 25 19:57 
> .part-b703d4b9-067a-4dfe-935e-3afc723aed56-0-4.inprogress.b7d9cf09-0f72-4dce-8591-b61b1d23ae9b
> -rw-r--r--  1 bang  staff    0  4 25 19:57 _MY_SUCCESS
> {code}
>  
> After some debug I found I have to set  {{sink.rolling-policy.file-size}} or 
> {{sink.rolling-policy.rollover-interval parameters, the default value of the 
> two parameters is pretty big(128M and 30min). It's not convenient for 
> test/demo. I think we can improve this.}}
>  
> As the doc[1] described, for row formats (csv, json), you can set the 
> parameter {{sink.rolling-policy.file-size}} or 
> {{sink.rolling-policy.rollover-interval}} in the connector properties and 
> parameter {{execution.checkpointing.interval}} in flink-conf.yaml together if 
> you don’t want to wait a long period before observe the data exists in file 
> system.
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-master/docs/connectors/table/filesystem/#rolling-policy



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to