Github user ggevay commented on the issue: https://github.com/apache/flink/pull/3511 > IMHO, in addition to these changes, there are still some potential improvements we can do about the sorter, like deserialization when comparing the real records. Do you mean `NormalizedKeySorter.compareRecords`? This calls `comparator.compareSerialized`, which is currently implemented for most types by simply deserializing the records and then comparing the deserialized form. It would maybe bring some performance improvement if this had a real implementation, which would deserialize only the key fields (which are relevant to the comparison). The hard part here is how to handle nullable fields, which make the offset of individual fields unpredictable. So I think a simpler and better approach is to just make sure that most types have a good implementation of `putNormalizedKey`, and then `NormalizedKeySorter.compareRecords` would be called only rarely, so its performance wouldn't really matter. > We are working on a project to fully code generate the algorithm Flink runtime used, and borrowed lots of idea of this PR, thanks! I'm happy to hear that! Btw. I think a large potential in code-generation is to eliminate the overheads of the very many virtual function calls throughout the runtime, by making the calls monomorphic [1,2]. For example, there could be a custom `MapDriver` generated for each map UDF, and then this custom class would always call the same UDF, making the call monomorphic, and thus easily optimizable by the JVM [1,2]. (I think @greghogan was also thinking about this.) For example, the following virtual calls are on the per-record code path: - drivers (such as `MapDriver`) calling `MutableObjectIterator.next` - drivers calling the UDF - drivers calling `output.collect` - `ReusingDeserializationDelegate` or `NonReusingDeserializationDelegate` calling `serializer.deserialize` - `SpillingResettableMutableObjectIterator` calling `serializer.serialize` and `serializer.deserialize` - `PojoSerializer`, `TupleSerializer`, `RowSerializer`, etc. calling their field serializers - `CountingCollector` calling `collector.collect` - `RecordWriter` calling `channelSelector.selectChannels` - `SerializationDelegate` calling `serializer.serialize` - `StreamElementSerializer` calling methods on its `typeSerializer` - `OutputEmitter` calling `comparator.hash` - `ComparableKeySelector` calling `comparator.extractKeys` - `assignToKeyGroup` calling `key.hashCode` I think most of the above calls are megamorphic (especially in larger Flink jobs with many operators), which makes them slow [1,2]. They could be made monomorphic by code-generating custom versions of these classes, where the customization would be to fix the type of the targets of these calls. (I think this could probably be done independently of the Table API.) Another potential for code-generation is customizing `OutputEmitter` for the number of channels: there are places where a modulo operation is performed, which is much faster if the divisor is a compile-time constant, since then the compiler uses such tricks as described in this paper: https://gmplib.org/~tege/divcnst-pldi94.pdf And the same optimization could be made in `KeyGroupRangeAssignment`, where there are divisions by `maxParallelism`. > we need more type information control and flexible code generating supports, so we choose to do it through the Table API & SQL. How do you see this approach? Having more info about the types, UDFs, etc. of your program certainly can help. Unfortunately I don't know too much about the Table API & SQL, but I have a few random thoughts: - Since you are generating the UDFs, it should be possible to make them have good object reuse behavior, without the users having to worry about the non-trivial object reuse rules. - For basic types (int, boolean, etc.) the generated code could use `IntValue`, `BooleanValue`, etc., facilitating more object reuse, again without making the users write ugly boilerplate (as they have to do in the DataSet/DataStream API when they want to use these classes). - Same thing with `StringValue` Btw. have you seen this PR for code generation for POJO serializers and comparators? https://github.com/apache/flink/pull/2211 It has some issues, so it is not as close to merging as this PR, but maybe I'll try to tie that up as well some time. [1] http://insightfullogic.com/2014/May/12/fast-and-megamorphic-what-influences-method-invoca/ [2] https://shipilev.net/blog/2015/black-magic-method-dispatch/
---