Hi, 

With a.connect(b).coprocess(xx).connect(c).coprocess(xx), there would create two
operators, the first operators would union a and b and output the enriched 
data, 
and then .connect(c).coprocess(xx) would pass-throught the already enriched data
and enrich the record from c. Since the two operators could not get chained, 
the performance
seems would be affected.

Another method is to first label each input with a tag, e.g., ("a", a record), 
("b", b record), ..
and then use 

a.union(b).union(c).union(d).process(xx)

then in the process operator, different logic could be chosen according to the 
tag. 

If adding tag is hard, then it might need to use the new multiple-inputs 
operator, which somehow would need
to use the low-level API of Flink, thus I would recommend the above tag + union 
method first.

Best, 
Yun
 ------------------Original Mail ------------------
Sender:B.B. <bijela.vr...@gmail.com>
Send Date:Fri Apr 2 16:41:16 2021
Recipients:flink_user <user@flink.apache.org>
Subject:Union of more then two streams

Hi,

I have an architecture question regarding the union of more than two streams in 
Apache Flink.

We are having three and sometime more streams that are some kind of code book 
with whom we have to enrich main stream.
Code book streams are compacted Kafka topics. Code books are something that 
doesn't change so often, eg currency. Main stream is a fast event stream.

Idea is to make a union of all code books and then join it with main stream and 
store the enrichment data as managed, keyed state (so when compact events from 
kafka expire I have the codebooks saved in state).

The problem is that enriched data foreign keys of every code book is different. 
Eg. codebook_1 has foreign key id codebook_fk1, codebook_2 has foreign key 
codebook_fk2,…. that connects with main stream.
This means I cannot use the keyBy with coProcessFunction.

Is this doable with union or I should cascade a series of connect streams with 
main stream, eg. mainstream.conect(codebook_1) -> 
mainstreamWihtCodebook1.connect(codebook_2) - > 
mainstreamWithCodebook1AndCodebook2.connect(codebook_3) - > ….?
I read somewhere that this later approach is not memory friendly.

Thx.

BB.

Reply via email to