[ https://issues.apache.org/jira/browse/FLINK-29545 ]
xiaogang zhou deleted comment on FLINK-29545: --------------------------------------- was (Author: zhoujira86): [~pnowojski] [https://github.com/apache/flink/pull/21080] Hi Master, I have modified the commit a little bit. the client side will send a heartbeat frame, if it detects there is no packet was sent within some time. thus the server can detect the network related issue, even the business flow is stopped for a while. And this commit is to detect the consuming stop issue haunted us for long. But with this commit, the consuming could stop immediately.... It would be very kind of you, if you can just only take a few mins review it and give suggestions, thus we could use this in our own env quickly. thanks a lot for your time, it will be valued! > kafka consuming stop when trigger first checkpoint > -------------------------------------------------- > > Key: FLINK-29545 > URL: https://issues.apache.org/jira/browse/FLINK-29545 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing, Runtime / Network > Affects Versions: 1.13.3 > Reporter: xiaogang zhou > Priority: Critical > Labels: pull-request-available > Attachments: backpressure 100 busy 0.png, task acknowledge na.png, > task dag.png > > > the task dag is like attached file. the task is started to consume from > earliest offset, it will stop when the first checkpoint triggers. > > is it normal?, for sink is busy 0 and the second operator has 100 backpressure > > and check the checkpoint summary, we can find some of the sub task is n/a. > I tried to debug this issue and found in the > triggerCheckpointAsync , the > triggerCheckpointAsyncInMailbox took a lot time to call > > > looks like this has something to do with > logCheckpointProcessingDelay, Has any fix on this issue? > > > can anybody help me on this issue? > > > > > thanks -- This message was sent by Atlassian Jira (v8.20.10#820010)