Roman Khachatryan created FLINK-19385: -----------------------------------------
Summary: Channel recovery may deadlock Key: FLINK-19385 URL: https://issues.apache.org/jira/browse/FLINK-19385 Project: Flink Issue Type: Bug Components: Runtime / Network, Runtime / Task Affects Versions: 1.11.2, 1.12.0 Reporter: Roman Khachatryan Assignee: Roman Khachatryan Fix For: 1.12.0 Consider the following case: * Two IntputGates * Input selection is not ALL (say FIRST initially) * Unaligned Checkpoints ON * on recovery, there are "parts" of records in all channels (actually 1 is enough I think) What happens: # StreamTask initiates recovery and scedule partition request upon it's end # All gates and channels will receive buffers from StateReader # All channels of a single gate will consume those state buffers - completing that gate's StateConsumedFuture # InputProcessor will return NOTHING_AVAILABLE (see StreamTwoInputProcessor.getInputStatus) # StreamTask will suspend its default action # State of the 2nd gate won't be consumed - so its StateConsumedFutures won't be completed - so no partitions will be requested Solution: request partitions independently for each channel. -- This message was sent by Atlassian Jira (v8.3.4#803005)