I couldn't find any explicit documentation on whether a broadcast operator
might invoke processElement and processBroadcastElement concurrently. At
first I suspected it can, hence the different Contexts (read-write,
read-only). But the TwoInputStreamOperator
<https://github.com/apache/flink/blob/4c6e15f046ca3157fd92a51b81d316468dd8d221/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/TwoInputStreamOperator.java>
-
which I think the Broadcast operator utilizes? - explicitly states that the
methods will not be called concurrently.

For context, I am considering implementing essentially a caching layer on
top of the broadcast state. The main motivation is to avoid the
deserialization overhead when accessing the state. The broadcast objects
are large, so even making them pure POJO (which may be a significant
undertaking) could still be painful. The broadcast objects are also
guaranteed to come in a single message (a single List<Rule>) every so
often. We currently clear the Map<Integer, Rule> state, and fully
repopulate it on every broadcast. I would like to do the same with a purely
local / java Map<Integer, Rule> but want to be sure I don't run into any
races.

I also recognize that this is very close to playing with fire, and is
exactly why we have a broadcast state where Flink can hide all the danger,
so would be open to other ideas if this is cardinal sin!

Trystan

Reply via email to