+1 Qiegang
On Tue, Mar 3, 2026 at 7:57 PM Gengliang Wang <[email protected]> wrote: > Hi Spark devs, > > I'd like to call a vote on the SPIP*: Change Data Capture (CDC) Support* > > *Summary:* > > This SPIP proposes a unified approach by adding a CHANGES SQL clause and > corresponding DataFrame/DataStream APIs that work across DSv2 connectors. > > 1. Standardized User API > > SQL: > > -- Batch: What changed between version 10 and 20? > > SELECT * FROM my_table CHANGES FROM VERSION 10 TO VERSION 20; > > -- Streaming: Continuously process changes > > CREATE STREAMING TABLE cdc_sink AS > > SELECT * FROM STREAM my_table CHANGES FROM VERSION 0; > > DataFrame API: > > spark.read > > .option("startingVersion", "10") > > .option("endingVersion", "20") > > .changes("my_table") > > 2. Engine-Level Post Processing Under the hood, this proposal introduces > a minimal Changelog interface for DSv2 connectors. Spark's Catalyst > optimizer will take over the CDC post-processing, including: > > - > > Filtering out copy-on-write carry-over rows. > - > > Deriving pre-image/post-image updates from raw insert/delete pairs. > - > > Computing net changes. > > > *Relevant Links:* > > - *SPIP Doc: * > > https://docs.google.com/document/d/1-4rCS3vsGIyhwnkAwPsEaqyUDg-AuVkdrYLotFPw0U0/edit?usp=sharing > - *Discuss Thread: * > https://lists.apache.org/thread/dhxx6pohs7fvqc3knzhtoj4tbcgrwxts > - *JIRA: *https://issues.apache.org/jira/browse/SPARK-55668 > > > *The vote will be open for at least 72 hours. *Please vote: > > [ ] +1: Accept the proposal as an official SPIP > > [ ] +0 > > [ ] -1: I don't think this is a good idea because ... > > Thanks, > Gengliang Wang >
