Implement Joins with Lookup Data

Harshvardhan Agrawal Mon, 23 Jul 2018 20:26:31 -0700

Hi,

We are using Flink for financial data enrichment and aggregations. We have
Positions data that we are currently receiving from Kafka. We want to
enrich that data with reference data like Product and Account information
that is present in a relational database. From my understanding of Flink so
far I think there are two ways to achieve this. Here are two ways to do it:


1) First Approach:
a) Get positions from Kafka and key by product key.
b) Perform lookup from the database for each key and then obtain
Tuple2<Position, Product>

2) Second Approach:
a) Get positions from Kafka and key by product key.
b) Window the keyed stream into say 15 seconds each.
c) For each window get the unique product keys and perform a single lookup.
d) Somehow join Positions and Products

In the first approach we will be making a lot of calls to the DB and the
solution is very chatty. Its hard to scale this cos the database storing
the reference data might not be very responsive.

In the second approach, I wish to join the WindowedStream with the
SingleOutputStream and turns out I can't join a windowed stream. So I am
not quite sure how to do that.

I wanted an opinion for what is the right thing to do. Should I go with the
first approach or the second one. If the second one, how can I implement
the join?

-- 


*Regards,Harshvardhan Agrawal*

Implement Joins with Lookup Data

Reply via email to