Filip Niksic created FLINK-14616:
------------------------------------

             Summary: Clarify the ordering guarantees in the "The Broadcast 
State Pattern"
                 Key: FLINK-14616
                 URL: https://issues.apache.org/jira/browse/FLINK-14616
             Project: Flink
          Issue Type: Improvement
          Components: Documentation
    Affects Versions: 1.9.1
            Reporter: Filip Niksic


When talking about the order of events in [The Broadcast State 
Pattern|https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/state/broadcast_state.html#important-considerations],
 the current documentation states that the downstream tasks must not assume the 
broadcast events to be ordered. However, this seems to be imprecise. According 
to the response I got from [~fhueske] to a 
[question|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Ordered-events-in-broadcast-state-tp30879.html]
 I sent to the Flink user mailing list:
{quote}The order of broadcasted inputs is not guaranteed when the operator that 
broadcasts its output has a parallelism > 1 because the tasks that receive the 
broadcasted input consume the records in "random" order from their input 
channels.
{quote}
In particular, when the parallelism of the broadcasting operator is 1, the 
order _is_ guaranteed.

[~fhueske] continues with his suggestions on how to ensure the correct ordering 
of the broadcast events:
{quote}So there are two approaches:
1) make the operator that broadcasts its output run as an operator with 
parallelism 1 (or add a MapOperator with parallelism 1 that just forwards its 
input). This will cause all broadcasted records to go through the same network 
channel and their order is guaranteed on each receiver.
2) use timestamps of broadcasted records for ordering and watermarks to reason 
about completeness.

If the broadcasted data is (comparatively) small in volume (which is usually 
given because otherwise broadcasting would be expensive), I'd go with the first 
option.
The second approach is more difficult to implement.
{quote}
It would be great if the ordering guarantees could be clarified to avoid 
confusion. This could be achieved by simply expanding the paragraph that talks 
about the order of events in the "important considerations" section. More 
ambitiously, the suggestions given by [~fhueske] could be turned into examples.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to