Hi,

We're building a Flink job to process metric data from client devices. We
need to enrich these events via an external HTTP API.  We were thinking
we'd use Flink state as a cache of this enrichment data to reduce the load
on the external service.  It seems AsyncFunctions do not support keyed
state at the moment, so that's out.

I've found this AWS blog describing a possible solution:
https://aws.amazon.com/blogs/big-data/implement-apache-flink-real-time-data-enrichment-patterns/


In the post, they're making synchronous HTTP calls in a
KeyedProcessFunction and it seems to perform well.  I've read conflicting
information on this pattern though, in similar questions on Stack Overflow,
saying performing synchronous requests like this is bad as it can block
checkpointing, etc.

Are there any recommended patterns to do something like this without
compromising Flink's fault tolerance?  Our enrichment data is somewhat
expensive to build and would be requested pretty frequently, but is fairly
long-lived (~24 hr TTL).  So, caching is a requirement to avoid negatively
impacting the other system. I suppose we could implement caching between
the API and Flink, but was hoping to avoid something like that.

Thanks!

Nate

Reply via email to