Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/2629#discussion_r86528549 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java --- @@ -540,15 +540,12 @@ private boolean performCheckpoint(CheckpointMetaData checkpointMetaData) throws synchronized (lock) { if (isRunning) { + checkpointState(checkpointMetaData); - // Since both state checkpointing and downstream barrier emission occurs in this - // lock scope, they are an atomic operation regardless of the order in which they occur. - // Given this, we immediately emit the checkpoint barriers, so the downstream operators - // can start their checkpoint work as soon as possible + // broadcast barriers after snapshot operators' states. operatorChain.broadcastCheckpointBarrier( - checkpointMetaData.getCheckpointId(), checkpointMetaData.getTimestamp()); - - checkpointState(checkpointMetaData); + checkpointMetaData.getCheckpointId(), checkpointMetaData.getTimestamp() + ); --- End diff -- Disabling chaining does not necessarily mean that the data is sent over the network then. Flink will still try to schedule the tasks to the same machine. Doing a benchmark is a good idea. I think we've also done it in the past and we concluded that the reentrant lock is too expensive if I remember correctly.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---