Hi, I have a use case where we read messages from a Kafka topic and invoke a webservice. The web-service call can take a take couple of seconds and then gives us back on avg 800KB of data. This data is set to another operator which does the parsing and then it gets sent to sink which saves the processed data in a NoSQL db. The app looks like this :
[image: Inline image 1] Since my payload from the web service is large a lot of data is transferred over the network and this is becoming a bottle neck. Lets say *I have 6 slots per node and I would like to have 1 slot for source, 3 slots for web service calls, 2 for parser and 1 for my sink*. This way all the processing can happen locally and there is no network overhead. I have tried *stream.forward() *but it requires that the down stream operator has the same number of parallelism as the one emitting data. Next I tried *stream.rescale()* and that does not schedule the task as I would expect it given the parallelism's on the operators are all multiple of each other (my flink cluster has enough empty slots and capacity). [image: Inline image 2] Is there a way to schedule my task's in a fashion where there is no data transfer over the network. I was able to do this in apache storm by using localOrShuffle grouping. Not sure how to acehive the same in flink. Any pointers would be really helpful. For now I have solved this problem by having the same parallelism on the web-service operator, parser, sink which causes flink to chain these operator together and execute them in the same thread.But ideally I would like to have more instances of the slow operator and less instances of my fast operator. ~ Karthik