[ https://issues.apache.org/jira/browse/FLINK-17505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Flink Jira Bot updated FLINK-17505: ----------------------------------- Labels: auto-deprioritized-major (was: stale-major) > Merge small files produced by StreamingFileSink > ----------------------------------------------- > > Key: FLINK-17505 > URL: https://issues.apache.org/jira/browse/FLINK-17505 > Project: Flink > Issue Type: Improvement > Components: Connectors / FileSystem > Affects Versions: 1.10.0 > Reporter: Piotr Nowojski > Priority: Major > Labels: auto-deprioritized-major > > This an alternative approach to FLINK-11499, to solve a problem of creating > many small files with bulk formats in StreamingFileSink (which have to be > rolled on checkpoint). > Merge based approach would require converting {{StreamingFileSink}} from a > sink, to an operator, that would be working exactly as it’s working right > now, with the same limitations (no support for arbitrary rolling policies for > bulk formats), followed by another operator that would be tasked with merging > small files in the background. > In the long term we probably would like to have both merge operator and write > ahead log solution (WAL described in FLINK-11499) as alternatives, as WAL > would behave better if small files are more common, and merge operator could > behave better if small files are rare (because of data skew for example). -- This message was sent by Atlassian Jira (v8.3.4#803005)