Hi

I have a kafka topic with json messages that I map to protobufs within a
data stream, and then send those to embedded stateful functions using the
datastream integration api (DataStream[RoutableMessage]). From there I need
to make an idempotent long-running blocking IO call.

I noticed that I was processing messages sequentially per kafka partition.
Is there a way that I could process them sequentially by key only (but in
parallel per partition)?

I created some code that uses the embedded functions'
registerAsyncOperation capabilities to make my long-running IO calls
effectively asynchronous, but I had to add all this custom code to enqueue
and persist any messages for a key that came in while there was an
in-flight IO call happening for that key. I'm fairly confident that I can
figure out all the fault tolerance cases _eventually_ (including re-sending
the in-flight message upon getting the UNKNOWN status back from the async
operation).

That said, am I missing a trick that would allow Flink/statefun to take
care of this "parallel per partition, sequential-per-key" behaviour? Remote
functions don't seem to have the performance we need, even with async http
transport.

Many thanks!
Fil

Reply via email to