J9527H commented on PR #4246:
URL: https://github.com/apache/flink-cdc/pull/4246#issuecomment-4841561001

   Hi @ThorneANN @yuxiqian, thanks for working on this — we'd really benefit 
from this feature.
   
   We're running a production MySQL CDC pipeline using the DataStream API 
(`MySqlSource` + custom `DebeziumDeserializationSchema`) on Apache Flink (Flink 
2.2 / flink-cdc 3.6.0-2.2). Our job relies on custom parsing, filtering, and 
DLQ routing logic that's tightly coupled to the DataStream API — migrating to 
the Pipeline (YAML) connector isn't an option for us without losing that 
flexibility.
   
   Our real-world use case: we periodically need to add new tables to a 
long-running job, but **we don't need historical/snapshot data for those new 
tables** — only incremental binlog events going forward. Today our only options 
are:
   
   1. `scanNewlyAddedTableEnabled(true)` — always triggers a snapshot phase, 
which doesn't match our requirement, and per #2105 has also been reported to 
occasionally hang during the snapshot phase.
   2. Run a separate, independent job per newly-added table using 
`StartupOptions.latest()` — works, but doesn't scale operationally.
   3. Manually extract the last committed binlog offset from TaskManager 
checkpoint logs and cold-start with `StartupOptions.specificOffset(...)` — 
works, but is manual, error-prone, and not officially documented for this use 
case (adding tables vs. failure recovery).
   
   Having `scanBinlogNewlyAddedTableEnabled` on `MySqlSourceBuilder`, 
consistent with what's already available in `MySqlDataSourceFactory` for the 
Pipeline connector, would let us avoid all three workarounds above.
   
   Happy to test against a patched build if that helps move this forward. Let 
me know if there's anything I can do to help (testing, providing more context 
on our use case, etc.).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to