We have an operator in our streaming application that needs to access 
'reference data' that is updated by another Flink streaming application. This 
reference data has about ~10,000 entries and has a small footprint. This 
reference data needs to be updated ~ every 100 ms. The required latency for  
this application is extremely low ( a couple of milliseconds), and we are 
therefore cautious of paying cost of I/O to access the reference data remotely. 
We are currently examining 3 different options for accessing this reference 
data:

1. Expose the reference data as QueryableState and access it directly from the 
'client' streaming operator using the QueryableState API
2. same as #1, but create an In-memory Java cache of the reference data within 
the operator that is asynchronously updated at a scheduled frequency using the 
QueryableState API
3. Output the reference data to Redis, and create an in-memory java cache of 
the reference data within the operator that is asynchronously updated at a 
scheduled frequency using Redis API. 

My understanding is that one of the cons of using Queryable state, is that if 
the Flink application that generates the reference data is unavailable, the 
Queryable state will not exist - is that correct?

If we were to use an asynchronously scheduled 'read' from the distributed 
cache, where should it be done? I was thinking of using 
ScheduledExecutorService from within the open method of the Flink operator.

What is the best way to get this done?

Regards,
Hayden Marchant

Reply via email to