Hi there,

So far, checkpoint trigger is hardcoded in CheckpointCorrdinator which
triggered periodically and push control messages to task managers. It was
implemented orthogonal to business logics implemented in jobs.

Our scenario requires master pipeline flow control messages along with
events in distributed queue. When follower pipeline source detect a control
message(checkpoint_barrier_id / watermark / restore_to_checkpoint_id) it
will block itself and send message to checkpoint coordinator and request a
specific checkpoint. Once ack from checkpoint coordinator, and acked back
checkpoint coordinator, source unblock itself and keep going. It would
contains certain message de-dupe logic which always honor first seeing
control message and discard latter duplicates given the assumption that
events consumed in distributed queue is strongly ordered.

Pros:

   - Allow pipeline author define customized checkpointing logic
   - Allow break down large pipeline into smaller ones and connect via
   streaming queue
      - leverage data locality and run sub pipeline near it's dependency.
      - pipe late arriving events to a side pipeline and consolidate there

Cons:

   - Overhead of blocking follower pipeline source
   - Overhead of distributed queue latency
   - Overhead of writing control messages to distributed queue


Does that makes sense?

Thanks,
Chen

Reply via email to