[ https://issues.apache.org/jira/browse/FLINK-22472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
luoyuxia updated FLINK-22472: ----------------------------- Comment: was deleted (was: I think this problem can be caused by two reasons: 1: Although the partition is comittable according to the partition commit policy you configure, there still remains data needed to be written to this patition. In this case, you may need to check your partition commit policy. 2: Currently, the StreamingFileWriter can't aware of it that the partition is comittable and then commits all files in this partition. So although the partition has been commited, the files in this parition haven't been commited. In this case, we can modify the logic of StreamingFileWriter to alleviate the problem.) > The real partition data produced time is behind meta(_SUCCESS) file produced > ---------------------------------------------------------------------------- > > Key: FLINK-22472 > URL: https://issues.apache.org/jira/browse/FLINK-22472 > Project: Flink > Issue Type: Improvement > Components: Connectors / FileSystem, Connectors / Hive > Reporter: Leonard Xu > Priority: Major > Attachments: image-2021-05-25-14-27-40-563.png > > > I test write some data to csv file by flink filesystem connector, but after > the success file produced, the data file is still un-committed, it's very > weird to me. > {code:java} > bang@mac db1.db $ll > /var/folders/55/cw682b314gn8jhfh565hp7q00000gp/T/junit8642959834366044048/junit484868942580135598/test-partition-time-commit/d\=2020-05-03/e\=12/ > total 8 > drwxr-xr-x 4 bang staff 128 4 25 19:57 ./ > drwxr-xr-x 8 bang staff 256 4 25 19:57 ../ > -rw-r--r-- 1 bang staff 12 4 25 19:57 > .part-b703d4b9-067a-4dfe-935e-3afc723aed56-0-4.inprogress.b7d9cf09-0f72-4dce-8591-b61b1d23ae9b > -rw-r--r-- 1 bang staff 0 4 25 19:57 _MY_SUCCESS > {code} > > After some debug I found I have to set {{sink.rolling-policy.file-size}} or > {{sink.rolling-policy.rollover-interval parameters, the default value of the > two parameters is pretty big(128M and 30min). It's not convenient for > test/demo. I think we can improve this.}} > > As the doc[1] described, for row formats (csv, json), you can set the > parameter {{sink.rolling-policy.file-size}} or > {{sink.rolling-policy.rollover-interval}} in the connector properties and > parameter {{execution.checkpointing.interval}} in flink-conf.yaml together if > you don’t want to wait a long period before observe the data exists in file > system. > [1] > https://ci.apache.org/projects/flink/flink-docs-master/docs/connectors/table/filesystem/#rolling-policy -- This message was sent by Atlassian Jira (v8.3.4#803005)