hql0312 commented on PR #3983: URL: https://github.com/apache/flink-cdc/pull/3983#issuecomment-2795541630
> > and the change code how to performance the logic ? > > Hi, @Mrart , @hql0312 . this pr is mainly focused on most situations. As I discussed in #2571, In snapshot split phase, after each snapshot split is finished, the fetcher will be closed. And newly added snapshot will open a new fetcher, and need lookup schema again. <img alt="Snipaste_2025-04-10_19-56-51" width="1240" src="https://private-user-images.githubusercontent.com/125648852/432277335-7db5fd1f-0867-42ac-a600-7cf96e452db0.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDQzMzI0MDksIm5iZiI6MTc0NDMzMjEwOSwicGF0aCI6Ii8xMjU2NDg4NTIvNDMyMjc3MzM1LTdkYjVmZDFmLTA4NjctNDJhYy1hNjAwLTdjZjk2ZTQ1MmRiMC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwNDExJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDQxMVQwMDQxNDlaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0xNTU0MWRlOWJkNTk4MzhlNzRhNTNkYzZjZGI3MjZmNWEyZjFhMmI5 OTU0NzYyYzU3YjkyOWY3OTZmYTY1ZjNkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.R47cTdt5w5JYM29axyPUjc7VCpfT_yMQE-CPwpW4PK4"> > > What you say is a special case when enabling newly added table. To be honest, it's not a good idea for that a reader fequently switch between streaming split and snapshot split(unless just 1 parrellism). The switch costs a lot. I have too ideas: > > 1. the first idea: do not assign newly added snapshot split to the reader which handling binlog. I will recently push forward a FLIP to let enumerator knows currently split distributions. > 2. the second idea: The splitReader can reading snapshot split and stream split into a queue in the same time, is no need to just read one in a time. It will need to change current thread model. > > But this PR still can improve a lot, we can let it in at first. if can hold streamsplit context when snapshot come ,and not recycle . the performance can improve. in stream phase, the pg connector will costs too much time for load schema. if we can hold schema in cache ,the cost will reduce. the logic is right? @loserwang1024 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org