Hi I have a kafka topic with json messages that I map to protobufs within a data stream, and then send those to embedded stateful functions using the datastream integration api (DataStream[RoutableMessage]). From there I need to make an idempotent long-running blocking IO call.
I noticed that I was processing messages sequentially per kafka partition. Is there a way that I could process them sequentially by key only (but in parallel per partition)? I created some code that uses the embedded functions' registerAsyncOperation capabilities to make my long-running IO calls effectively asynchronous, but I had to add all this custom code to enqueue and persist any messages for a key that came in while there was an in-flight IO call happening for that key. I'm fairly confident that I can figure out all the fault tolerance cases _eventually_ (including re-sending the in-flight message upon getting the UNKNOWN status back from the async operation). That said, am I missing a trick that would allow Flink/statefun to take care of this "parallel per partition, sequential-per-key" behaviour? Remote functions don't seem to have the performance we need, even with async http transport. Many thanks! Fil