Re: [PR] [FLINK-34820][postgressql] Not recycle but reuse fetcher for all data to improve performant [flink-cdc]

via GitHub Thu, 10 Apr 2025 17:59:33 -0700


hql0312 commented on PR #3983:
URL: https://github.com/apache/flink-cdc/pull/3983#issuecomment-2795541630


   > > and the change code how to performance the logic ?
   > 
   > Hi, @Mrart , @hql0312 . this pr is mainly focused on most situations. As I 
discussed in #2571, In snapshot split phase, after each snapshot split is 
finished, the fetcher will be closed. And newly added snapshot will open a new 
fetcher, and need lookup schema again. <img alt="Snipaste_2025-04-10_19-56-51" 
width="1240" 
src="https://private-user-images.githubusercontent.com/125648852/432277335-7db5fd1f-0867-42ac-a600-7cf96e452db0.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDQzMzI0MDksIm5iZiI6MTc0NDMzMjEwOSwicGF0aCI6Ii8xMjU2NDg4NTIvNDMyMjc3MzM1LTdkYjVmZDFmLTA4NjctNDJhYy1hNjAwLTdjZjk2ZTQ1MmRiMC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwNDExJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDQxMVQwMDQxNDlaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0xNTU0MWRlOWJkNTk4MzhlNzRhNTNkYzZjZGI3MjZmNWEyZjFhMmI5
 
OTU0NzYyYzU3YjkyOWY3OTZmYTY1ZjNkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.R47cTdt5w5JYM29axyPUjc7VCpfT_yMQE-CPwpW4PK4">
   > 
   > What you say is a special case when enabling newly added table. To be 
honest, it's not a good idea for that a reader fequently switch between 
streaming split and snapshot split（unless just 1 parrellism). The switch costs 
a lot. I have too ideas:
   > 
   > 1. the first idea: do not assign newly added snapshot split to the reader 
which handling binlog. I will recently push forward a FLIP to let enumerator 
knows currently split distributions.
   > 2. the second idea: The splitReader can reading snapshot split and stream 
split into a queue in the same time, is no need to just read one in a time. It 
will need to change current thread model.
   > 
   > But this PR still can improve a lot, we can let it in at first.
   
   if can hold streamsplit context when snapshot come ,and not recycle . the 
performance can improve.
   
   in stream phase, the pg connector will costs too much time for load schema. 
if we can hold schema in cache ,the cost will reduce.
   
   the logic is right?
   @loserwang1024 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-34820][postgressql] Not recycle but reuse fetcher for all data to improve performant [flink-cdc]

Reply via email to