For the first one (lookup of single entries) you could use a NoSQL db (eg key 
value store) - a relational database will not scale.

Depending on when you need to do the enrichment you could also first store the 
data and enrich it later as part of a batch process. 

> On 24. Jul 2018, at 05:25, Harshvardhan Agrawal 
> <harshvardhan.ag...@gmail.com> wrote:
> 
> Hi,
> 
> We are using Flink for financial data enrichment and aggregations. We have 
> Positions data that we are currently receiving from Kafka. We want to enrich 
> that data with reference data like Product and Account information that is 
> present in a relational database. From my understanding of Flink so far I 
> think there are two ways to achieve this. Here are two ways to do it:
> 
> 1) First Approach:
> a) Get positions from Kafka and key by product key.
> b) Perform lookup from the database for each key and then obtain 
> Tuple2<Position, Product>
> 
> 2) Second Approach:
> a) Get positions from Kafka and key by product key.
> b) Window the keyed stream into say 15 seconds each.
> c) For each window get the unique product keys and perform a single lookup.
> d) Somehow join Positions and Products
> 
> In the first approach we will be making a lot of calls to the DB and the 
> solution is very chatty. Its hard to scale this cos the database storing the 
> reference data might not be very responsive.
> 
> In the second approach, I wish to join the WindowedStream with the 
> SingleOutputStream and turns out I can't join a windowed stream. So I am not 
> quite sure how to do that. 
> 
> I wanted an opinion for what is the right thing to do. Should I go with the 
> first approach or the second one. If the second one, how can I implement the 
> join?
> 
> -- 
> Regards,
> Harshvardhan Agrawal

Reply via email to