Navneeth, Thanks for posting this question.
This looks like our future scenario where we might end up with. We are working on a Similar problem statement with two differences. 1) The cache items would not change frequently say max of once per month or few times per year and the number of entities in cache would not be more than 1000. (Say Java objects) 2) The Eventload we look at is around 10-50k/sec. We are using broadcast mechanism for the same. Prasanna. On Thu 26 Nov, 2020, 14:01 Navneeth Krishnan, <reachnavnee...@gmail.com> wrote: > Hi All, > > We have a flink streaming job processing around 200k events per second. > The job requires a lot of less frequently changing data (sort of static but > there will be some changes over time, say 5% change once per day or so). > There are about 12 caches with some containing approximately 20k > entries whereas a few with about 2 million entries. > > In the current implementation we are using in-memory lazy loading static > cache to populate the data and the initialization happens in open function. > The reason to choose this approach is because we have allocated around 4GB > extra memory per TM for these caches and if a TM has 6 slots the cache can > be shared. > > Now the issue we have with this approach is everytime when a container is > restarted or a new job is deployed it has to populate the cache again. > Sometimes this lazy loading takes a while and it causes back pressure as > well. We were thinking to move this logic to the broadcast stream but since > the data has to be stored per slot it would increase the memory consumption > by a lot. > > Another option that we were thinking of is to replace the current near far > cache that uses rest api to load the data to redis based near far cache. > This will definitely reduce the overall loading time but still not the > perfect solution. > > Are there any recommendations on how this can be achieved effectively? > Also how is everyone overcoming this problem? > > Thanks, > Navneeth > >