For the first one (lookup of single entries) you could use a NoSQL db (eg key value store) - a relational database will not scale.
Depending on when you need to do the enrichment you could also first store the data and enrich it later as part of a batch process. > On 24. Jul 2018, at 05:25, Harshvardhan Agrawal > <harshvardhan.ag...@gmail.com> wrote: > > Hi, > > We are using Flink for financial data enrichment and aggregations. We have > Positions data that we are currently receiving from Kafka. We want to enrich > that data with reference data like Product and Account information that is > present in a relational database. From my understanding of Flink so far I > think there are two ways to achieve this. Here are two ways to do it: > > 1) First Approach: > a) Get positions from Kafka and key by product key. > b) Perform lookup from the database for each key and then obtain > Tuple2<Position, Product> > > 2) Second Approach: > a) Get positions from Kafka and key by product key. > b) Window the keyed stream into say 15 seconds each. > c) For each window get the unique product keys and perform a single lookup. > d) Somehow join Positions and Products > > In the first approach we will be making a lot of calls to the DB and the > solution is very chatty. Its hard to scale this cos the database storing the > reference data might not be very responsive. > > In the second approach, I wish to join the WindowedStream with the > SingleOutputStream and turns out I can't join a windowed stream. So I am not > quite sure how to do that. > > I wanted an opinion for what is the right thing to do. Should I go with the > first approach or the second one. If the second one, how can I implement the > join? > > -- > Regards, > Harshvardhan Agrawal