Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1517#issuecomment-186219363 OK, then lets keep the data in one partition for now. In case of var-length updates, this can default to a memory usage / combine behavior which is somewhat similar to the sort-based strategy: Filling the memory with records and emitting it (putting compaction aside). I'll review the PR once more will run a few end-to-end benchmarks as well. What kind of benchmarks have you done so far? - Did you check the combine rate (input / output ratio) compared to the sort-based strategy? - How much memory did you use for tests (upper bound)? Did you vary the memory? - Have you checked heap memory consumption / GC activity compared to the sort-based strategy? It might take a few more days before I actually get to this, but it is on my list. Thanks, Fabian
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---