Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/6075 Hi @zhangminglei Sorry for the late response - I thought about this solution quite a bit and came to the conclusion that we may need to do a bit more for efficient results: Please take a look at [FLINK-9749](https://issues.apache.org/jira/browse/FLINK-9749) and the subtask [FLINK-9753](https://issues.apache.org/jira/browse/FLINK-9753) The description outlines why I believe the simple approach suggested here may not be enough (will frequently result in badly compressed ORC/Parquet). We have already started this effort to completely redesign the BucketingSink. The initial work-in-progress looks quite promising.
---