Hello BB,

Just want to share you some of my immature ideas. Maybe some experts can give 
you better solutions and advice.

  1.  DataStream based solution:
     *   To do a union, as you already know, you must have the datastream to be 
of the same format. Otherwise, you can’t do it. There is a work around way to 
solve you problem. You can ingest the datastream with deserializationSchema and 
map different code book streams to the same Java type, there is a field of 
foreign key value (codebook_fk1, cookbook_fk2 values will all stored here), 
another field just contains the name of the foreign value (e.g. cookbook_fk1.) 
All other fields should also be generalized into such Java Type. After that, 
you can do a union for these different code book  streams and join with 
mainstream.
     *   For cascade connect streams, I guess it is not a suggested approach, 
in additional to memory, I think it will also make the watermark hard to 
coordinate.
  2.  Flink SQL approach:

You can try to use Flink temporal table join to do the join work here. [1][2]. 
For such approach, you are cascade the join to enrich the mainstream. This 
seems to be fitting into your use case since your enrich stream doesn’t change 
so often and contains something like currency. For such join, there should be 
some internal optimization and might get rid of some memory consumption issues, 
I guess? Maybe I am wrong. But it worth to take a look.




Reference:
[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/streaming/joins.html
[2] 
https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/streaming/joins.html#event-time-temporal-join

Best,
Fuyao



From: B.B. <bijela.vr...@gmail.com>
Date: Friday, April 2, 2021 at 01:41
To: user@flink.apache.org <user@flink.apache.org>
Subject: [External] : Union of more then two streams
Hi,

I have an architecture question regarding the union of more than two streams in 
Apache Flink.

We are having three and sometime more streams that are some kind of code book 
with whom we have to enrich main stream.
Code book streams are compacted Kafka topics. Code books are something that 
doesn't change so often, eg currency. Main stream is a fast event stream.

Idea is to make a union of all code books and then join it with main stream and 
store the enrichment data as managed, keyed state (so when compact events from 
kafka expire I have the codebooks saved in state).

The problem is that enriched data foreign keys of every code book is different. 
Eg. codebook_1 has foreign key id codebook_fk1, codebook_2 has foreign key 
codebook_fk2,…. that connects with main stream.
This means I cannot use the keyBy with coProcessFunction.

Is this doable with union or I should cascade a series of connect streams with 
main stream, eg. mainstream.conect(codebook_1) -> 
mainstreamWihtCodebook1.connect(codebook_2) - > 
mainstreamWithCodebook1AndCodebook2.connect(codebook_3) - > ….?
I read somewhere that this later approach is not memory friendly.

Thx.

BB.

Reply via email to