[ https://issues.apache.org/jira/browse/FLINK-17505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17327854#comment-17327854 ]
Flink Jira Bot commented on FLINK-17505: ---------------------------------------- This major issue is unassigned and itself and all of its Sub-Tasks have not been updated for 30 days. So, it has been labeled "stale-major". If this ticket is indeed "major", please either assign yourself or give an update. Afterwards, please remove the label. In 7 days the issue will be deprioritized. > Merge small files produced by StreamingFileSink > ----------------------------------------------- > > Key: FLINK-17505 > URL: https://issues.apache.org/jira/browse/FLINK-17505 > Project: Flink > Issue Type: Improvement > Components: Connectors / FileSystem > Affects Versions: 1.10.0 > Reporter: Piotr Nowojski > Priority: Major > Labels: stale-major > > This an alternative approach to FLINK-11499, to solve a problem of creating > many small files with bulk formats in StreamingFileSink (which have to be > rolled on checkpoint). > Merge based approach would require converting {{StreamingFileSink}} from a > sink, to an operator, that would be working exactly as it’s working right > now, with the same limitations (no support for arbitrary rolling policies for > bulk formats), followed by another operator that would be tasked with merging > small files in the background. > In the long term we probably would like to have both merge operator and write > ahead log solution (WAL described in FLINK-11499) as alternatives, as WAL > would behave better if small files are more common, and merge operator could > behave better if small files are rare (because of data skew for example). -- This message was sent by Atlassian Jira (v8.3.4#803005)