[DISCUSS] JDBC exactly-once sink

Roman Khachatryan Mon, 06 Jan 2020 00:41:43 -0800

Hi everyone,

I'm currently working on exactly-once JDBC sink implementation for Flink.
Any ideas and/or feedback are welcome.


I've considered the following options:
1. Two-phase commit. This is similar to Kafka sink.
XA or database-specific API can be used. In case of XA, each sink subtask
acts as a transaction manager, and each checkpoint-subtask pair corresponds
to an XA transaction (with a single branch)
2. Write-ahead log. This is similar to Cassandra sink.
Transactions metadata needs to be stored in the database along with data to
avoid adding duplicates after recovery.

For some scenarios, WAL might be better, but in general, XA seems to be a
better option.

==================
XA vs WAL comparison
==================

1. Consistency: XA preferable
WAL: longer inconsistency windows when writing from several sink subtasks

2. Performance and efficiency: XA preferable (depends on the use case)
XA:
- long-running transactions may delay GC and may hold locks (depends on the
use case)
- databases/drivers may have XA implementation issues
WAL:
- double (de)serialization and IO (first to flink state, then to database)
- read-from-state and write-to-database spikes on checkpoint completion
both may have read spikes in consumer

3. Database support: XA preferable
XA: most popular RDBMS do support it (at least mysql, pgsql, mssql, oracle,
db2, sybase)
WAL: meta table DDL may differ

4. Operability: depends on the use case
XA:
- increased undo segment (db may need to maintain a view from the
transaction start)
- abandoned transactions cleanup (abandoned tx may cause starvation if for
example database blocks inserts of duplicates in different transactions)
- (jars aren't an issue - most drivers ship XA implementation)
WAL:
- increased intermediate flink state
- need to maintain meta table

5. Simplicity: about the same
XA: more corner cases
WAL: state and meta table management
Both wrap writes into transactions

6. Testing - WAL preferable
XA requires MVVC and proper XA support (no jars needed for derby)

--
Regards,
Roman

[DISCUSS] JDBC exactly-once sink

Reply via email to