Hi Chen! If I understand, you want to implement a custom way of triggering checkpoints, based on messages in the input message queue (for example based on Kafka events)? Basically to trigger a checkpoint when you have received a special message through each Kafka partition?
Please let me know if that is the correct understanding of your question. That is an interesting use case. My feeling is that the simplest way to go about that would be to do the following: - Add to Flink a way to trigger a checkpoint via a RPC call. There is such a call to trigger savepoints, for checkpoints, it could be similarly added. - Implement a custom source that waits for these "trigger-events". When it receives such an event, it blocks and send an RPC call to a custom service you implement. - Once the service has received a call from all sources, it sends an RPC call to the checkpoint coordinator to trigger the checkpoint. - The sources continue after they triggered the checkpoint. Greetings, Stephan On Mon, Jul 18, 2016 at 12:40 AM, Chen Qin <qinnc...@gmail.com> wrote: > Hi there, > > So far, checkpoint trigger is hardcoded in CheckpointCorrdinator which > triggered periodically and push control messages to task managers. It was > implemented orthogonal to business logics implemented in jobs. > > Our scenario requires master pipeline flow control messages along with > events in distributed queue. When follower pipeline source detect a control > message(checkpoint_barrier_id / watermark / restore_to_checkpoint_id) it > will block itself and send message to checkpoint coordinator and request a > specific checkpoint. Once ack from checkpoint coordinator, and acked back > checkpoint coordinator, source unblock itself and keep going. It would > contains certain message de-dupe logic which always honor first seeing > control message and discard latter duplicates given the assumption that > events consumed in distributed queue is strongly ordered. > > Pros: > > - Allow pipeline author define customized checkpointing logic > - Allow break down large pipeline into smaller ones and connect via > streaming queue > - leverage data locality and run sub pipeline near it's dependency. > - pipe late arriving events to a side pipeline and consolidate there > > Cons: > > - Overhead of blocking follower pipeline source > - Overhead of distributed queue latency > - Overhead of writing control messages to distributed queue > > > Does that makes sense? > > Thanks, > Chen >