Github user EronWright commented on the issue:

    https://github.com/apache/flink/pull/5427
  
    I feel that we're not addressing the core issue that we're trying to fix.   
    1. A new job starts up with checkpointing enabled and a hook-based source.
    2. The source begins to consume events, causing some external state to 
become mutated.
    3. _Before the first checkpoint_, a task throws an exception, causing a 
global restart.
    4. Since the hook has no opportunity to rewind the external state to 
initial conditions, data loss occurs.
    
    The above is a special case.  In the normal case, one or more checkpoints 
have occurred before the restart occurs, and so the hook's `restore` method is 
effective.



---

Reply via email to