Hey Ken, thanks for your message. Both your comments are correct (see inline).
On Fri, Nov 10, 2017 at 10:31 PM, Ken Krugler <kkrugler_li...@transpac.com> wrote: > 1. A downstream function in the iteration was (significantly) increasing the > number of tuples - it would get one in, and sometimes emit 100+. > > The output would loop back as input via the iteration. > > This eventually caused the network buffers to fill up, and that’s why the > job got stuck. > > I had to add my own tracking/throttling in one of my custom function, to > avoid having too many “active” tuples. > > So maybe something to note in documentation on iterations, if it’s not there > already. Yes, iterations are prone to deadlock due to the way that data is exchanged between the sink and head nodes. There have been multiple attempts to fix these shortcomings, but I don't know what the latest state is. Maybe Aljoscha (CC'd) has some input... > 2. The back pressure calculation doesn’t take into account AsyncIO Correct, the back pressure monitoring only takes the main task thread into account. Every operator that uses a separate thread to emit records (like Async I/O oder Kafka source) is therefore not covered by the back pressure monitoring. – Ufuk