Initially, we had a flink pipeline which did the following - Kafka source -> KeyBy ID -> Map -> Kafka Sink.
We now want to enrich this pipeline with a side input from a file. Similar to the recommendation here <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Best-pattern-for-achieving-stream-enrichment-side-input-from-a-large-static-source-td25771.html#a25780> , we modified our pipeline to have the following flow. Kafka ---------------------------------> KeyBy ID------ -----Co flat map -----> Sink | | Continuous File Monitoring Function ->KeyBy ID ------ (ID, static data) The file is relatively small( a few GBs) but not small enough to fit in RAM of the driver node. So we cannot load it there. The issue I am facing is that we are having high back pressure in this new pipeline after adding the file source. The initial job(without side inputs) worked fine with good through put. However, after adding the second source and a co flat Map function, even 4X ing the amount of parallelism does not solve the problem and has high back pressure. 1. The back pressure is high between the Kafka source and the coFlatMap function. Is this expected? Could some one help with the reasoning behind why this new pipeline is much more resource intensive than the original pipeline? 2. Are there any caveats to keep in mind when connecting a bounded stream with an unbounded stream? 3. Also, is this the recommended design pattern for enriching a stream with a static data set? Is there a better way to solve the problem? Also we are using Flink-1.6.3 so we do not have the ability to use any API advancements in the future versions of Flink. Thanks! -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/