We are using the Beam 2.20 Python SDK on a Flink 1.9 runner. Our messages originate from a custom source that consumes messages from a Kafka topic and emits them in the order of their Kafka offsets to a DoFn. After this DoFn processes the messages, they are emitted to a custom sink that sends messages to a Kafka topic.
We want to process those messages in the order in which we receive them from Kafka and then emit them to the Kafka sink in the same order, but based on our understanding Beam does not provide an in-order transport. However, in practice we noticed that with a Python SDK worker on Flink and a parallelism setting of 1 and one sdk_worker instance, messages seem to be both processed and emitted in order. Is that implementation-specific in-order behavior something that we can rely on, or is it very likely that this will break at some future point? In case it's not recommended to depend on that behavior what is the best approach for in-order processing? https://stackoverflow.com/questions/45888719/processing-total-ordering-of-events-by-key-using-apache-beam recommends to order events in a heap, but according to our understanding this approach will only work when directly writing to an external system.