GitHub user NicoK opened a pull request: https://github.com/apache/flink/pull/6115
[FLINK-9435][java] Remove per-key selection Tuple instantiation via reflection in ComparableKeySelector and ArrayKeySelector ## What is the purpose of the change Inside `KeySelectorUtil`, every `ComparableKeySelector#getKey()` call currently creates a new tuple from `Tuple.getTupleClass(keyLength).newInstance();` which seems expensive. Instead, we could get a template tuple and use `Tuple#copy()` which copies the right sub-class in a more optimal way. Similarly, `ArrayKeySelector` instantiates new Tuple instances via reflection (albeit caching the required tuple class for the returned key) which can be changed the same way. With the micro-benchmarks added for a very simple job basically only doing a `keyBy()` (https://github.com/dataArtisans/flink-benchmarks/pull/5), I get these results: ``` Benchmark Mode Cnt Score Error Units ------------- old --------------- KeyByBenchmarks.arrayKeyBy thrpt 9 1055.706 ± 170.221 ops/ms KeyByBenchmarks.tupleKeyBy thrpt 9 1537.923 ± 271.665 ops/ms ------------- new --------------- KeyByBenchmarks.arrayKeyBy thrpt 9 1213.073 ± 39.672 ops/ms KeyByBenchmarks.tupleKeyBy thrpt 9 1848.172 ± 188.013 ops/ms ``` That is roughly 15% more for the `ArrayKeySelector` and 20% more for the `ComparableKeySelector`. ## Brief change log - optimise `ComparableKeySelector` using a template `Tuple` instance determined once - optimise `ArrayKeySelector` using a template `Tuple` instance determined once ## Verifying this change This change is already covered by existing tests. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): **no** - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **no** - The serializers: **no** - The runtime per-record code paths (performance sensitive): **yes** - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: **no** - The S3 file system connector: **no** ## Documentation - Does this pull request introduce a new feature? **no** - If yes, how is the feature documented? **not applicable** You can merge this pull request into a Git repository by running: $ git pull https://github.com/NicoK/flink flink-9435 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/6115.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6115 ---- commit b0c6944b861f57ade038c492141febf8fa9b7502 Author: Nico Kruber <nico@...> Date: 2018-05-24T22:09:37Z [FLINK-9435][java] optimise ComparableKeySelector for more efficient Tuple creation commit 77a8349a7c51ec96bf49eee841b0c05acb1815f6 Author: Nico Kruber <nico@...> Date: 2018-06-04T11:30:57Z [FLINK-9435][java] optimise ArrayKeySelector for more efficient Tuple creation ---- ---