Github user EronWright commented on the issue: https://github.com/apache/flink/pull/5427 I feel that we're not addressing the core issue that we're trying to fix. 1. A new job starts up with checkpointing enabled and a hook-based source. 2. The source begins to consume events, causing some external state to become mutated. 3. _Before the first checkpoint_, a task throws an exception, causing a global restart. 4. Since the hook has no opportunity to rewind the external state to initial conditions, data loss occurs. The above is a special case. In the normal case, one or more checkpoints have occurred before the restart occurs, and so the hook's `restore` method is effective.
---