Hi If you just want to make sure some key goes into the same subtask, does custom key selector[1] help?
For the keygroup and subtask information, you can ref to KeyGroupRangeAssignment[2] for more info, and the max parallelism logic you can ref to doc[3] [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/api_concepts.html#define-keys-using-key-selector-functions [2] https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/state/KeyGroupRangeAssignment.java [3] https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html#setting-the-maximum-parallelism Best, Congxian 杨东晓 <laolang...@gmail.com> 于2020年1月9日周四 上午7:47写道: > Hi , I'm trying to do some optimize about Flink 'keyby' processfunction. > Is there any possible I can find out one key belongs to which key-group > and essentially find out one key-group belongs to which subtask. > The motivation I want to know that is we want to force the data records > from upstream still goes to same taskmanager downstream subtask .Which > means even if we use a keyedstream function we still want no cross jvm > communication happened during run time. > And if we can achieve that , can we also avoid the expensive cost for > record serialization because data is only transferred in same taskmanager > jvm instance? > > Thanks. >