Hi, I have a streaming topology with source parallelism of M and a target operator parallelism of N. For optimum performance I have found that I need to choose M and N independently. Also, the source subtasks do not all produce the same number of records and therefor I have to rebalance to the target operator to get optimum throughput.
The record sizes vary a lot (up to 10MB) but are about 200kB on average. Through experimentation using the rescale() operator I have found that maximum throughput can be significantly increased if I restrict this rebalancing to target subtasks within the same TaskManager instances. However I cannot use rescale for this purpose as it does not do a rebalancing to all target subtasks in the instance. I was hoping to use a custom Partitioner to achieve this but it is not clear to me which partition would correspond to which subTask. Is there any way currently to achieve this with Flink? If it helps I believe the feature I am hoping to achieve is similar to Storm's "Local or shuffle grouping". Any help or suggestions will be appreciated. Hans -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/