Hi all, First post here, so please be kind :)
Firstly some context; I have the following high-level job topology: (1) FlinkPulsarSource -> (2) RichAsyncFunction -> (3) SinkFunction 1. The FlinkPulsarSource reads event notifications about article updates from a Pulsar topic 2. The RichAsyncFunction fetches the “full” article from the specified URL end-point, and transmutes it into a “legacy” article format 3. The SinkFunction writes the “legacy” article to a (legacy) web platform i.e. the sink is effectively another web site I have this all up and running (despite lots of shading fun). When the SinkFunction creates an article on the legacy platform it returns an 'HTTP 201 - Created’ with a Location header suitably populated. Now, I need to persist that Location header and, more explicitly, need to persist a map between the URLs for the new and legacy platforms. This is needed for latter update and delete processing. The question is how do I store this mapping information? I’ve spent some time trying to grok state management and the various backends, but from what I can see the state management is focused on “operator scoped” state. This seems reasonable given the requirement for barriers etc to ensure accurate recovery. However, I need some persistence between operators (shared state?) and with longevity beyond the processing of an operator. My gut reaction is that I need an external K-V store such as Ignite (or similar). Frankly given that Flink ships with embedded RocksDB I was hoping to use that, but there seems no obvious way to do this, and lots of advice saying don’t :) Am I missing something obvious here? Many thanks in advance Paul