Initially, we had a flink pipeline which did the following - 
Kafka source -> KeyBy ID -> Map -> Kafka Sink. 

We now want to enrich this pipeline with a side input from a file. Similar
to the recommendation  here
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Best-pattern-for-achieving-stream-enrichment-side-input-from-a-large-static-source-td25771.html#a25780>
 
, 
we modified our pipeline to have the following flow. 

Kafka ---------------------------------> KeyBy ID------ -----Co flat map
-----> Sink
                                                                            
|
                                                                            
|
Continuous File Monitoring Function ->KeyBy ID ------ 
(ID, static data)

The file is relatively small( a few GBs) but not small enough to fit in RAM
of the driver node. So we cannot load it there. 

The issue I am facing is that we are having high back pressure in this new
pipeline after adding the file source. The initial job(without side inputs)
worked fine with good through put. However, after adding the second source
and a co flat Map function, even 4X ing the amount of parallelism does not
solve the problem and has high back pressure. 
1. The back pressure is high between the Kafka source and the coFlatMap
function. 
Is this expected?  Could some one help with the reasoning behind why this
new pipeline is much more resource intensive than the original pipeline? 
2. Are there any caveats to keep in mind when connecting a bounded stream
with an unbounded stream? 
3. Also, is this the recommended design pattern for enriching a stream with
a static data set? Is there a better way to solve the problem? 

Also we are using Flink-1.6.3 so we do not have the ability to use any API
advancements in the future versions of Flink. 

Thanks! 



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to