Thanks a lot for your quick response! Your suggestion however would never work for our use case. Ours is a streaming system that must process 100 thousand messages per second and produce immediate results and it's simply impossible to rerun the job.
Our job is a streaming job broken down into various operators with very strict latency requirements (less than 10 seconds at all times). There could be multiple messages for a given entity in quick succession and ordered processing is another strict requirement. Question is how can we best leverage flink's features of stateful stream processing as well as async IO.