dybyte commented on code in PR #10429: URL: https://github.com/apache/seatunnel/pull/10429#discussion_r2851746202
########## docs/en/architecture/fault-tolerance/checkpoint-mechanism.md: ########## @@ -0,0 +1,759 @@ +--- +sidebar_position: 1 +title: Checkpoint Mechanism +--- + +# Checkpoint Mechanism + +## 1. Overview + +### 1.1 Problem Background + +Distributed data processing systems face critical challenges for fault tolerance: + +- **State Loss**: How to preserve processing state across failures? +- **Exactly-Once**: How to ensure each record is processed exactly once? +- **Distributed Consistency**: How to create consistent snapshots across distributed tasks? +- **Performance**: How to checkpoint without blocking data processing? +- **Recovery**: How to efficiently restore state after failures? + +### 1.2 Design Goals + +SeaTunnel's checkpoint mechanism aims to: + +1. **Guarantee Exactly-Once Semantics**: Consistent state snapshots + two-phase commit +2. **Minimize Overhead**: Asynchronous checkpoint, no data processing blocking +3. **Fast Recovery**: Restore from latest checkpoint in seconds +4. **Distributed Coordination**: Coordinate checkpoints across hundreds of tasks +5. **Pluggable Storage**: Support multiple storage backends (HDFS, S3, Local, OSS) + +### 1.3 Theoretical Foundation + +SeaTunnel's checkpoint is based on the **Chandy-Lamport distributed snapshot algorithm**: + +**Key Idea**: Insert special markers (barriers) into data streams. When a task receives barrier: +1. Snapshot its local state +2. Forward barrier downstream +3. Continue processing + +Result: Globally consistent snapshot without pausing entire system. + +**Reference**: ["Distributed Snapshots: Determining Global States of Distributed Systems"](https://lamport.azurewebsites.net/pubs/chandy.pdf) (Chandy & Lamport, 1985) + +## 2. Architecture Design + +### 2.1 Checkpoint Architecture + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ JobMaster (per job) │ +│ │ +│ ┌───────────────────────────────────────────────────────┐ │ +│ │ CheckpointCoordinator (per pipeline) │ │ +│ │ │ │ +│ │ • Trigger checkpoint (periodic/manual) │ │ +│ │ • Generate checkpoint ID │ │ +│ │ • Track pending checkpoints │ │ +│ │ • Collect task acknowledgements │ │ +│ │ • Persist completed checkpoints │ │ +│ │ • Cleanup old checkpoints │ │ +│ └───────────────────────────────────────────────────────┘ │ +│ │ │ +│ │ (Trigger Barrier) │ +│ ▼ │ +└─────────────────────────────────────────────────────────────────┘ + │ + │ (CheckpointBarrier) + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ Worker Nodes │ +│ │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ SourceTask 1 │ │ SourceTask 2 │ │ SourceTask N │ │ +│ │ │ │ │ │ │ │ +│ │ 1. Receive │ │ 1. Receive │ │ 1. Receive │ │ +│ │ Barrier │ │ Barrier │ │ Barrier │ │ +│ │ 2. Snapshot │ │ 2. Snapshot │ │ 2. Snapshot │ │ +│ │ State │ │ State │ │ State │ │ +│ │ 3. ACK │ │ 3. ACK │ │ 3. ACK │ │ +│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ +│ │ │ │ │ +│ │ (Barrier Propagation) │ │ +│ ▼ ▼ ▼ │ Review Comment: It might be better to add `4. Forward` in the SourceTask diagram. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
