On Thu, Aug 10, 2023 at 10:15 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Wed, Aug 9, 2023 at 8:28 AM Zhijie Hou (Fujitsu) > <houzj.f...@fujitsu.com> wrote: > > > > On Thursday, August 3, 2023 7:30 PM Melih Mutlu <m.melihmu...@gmail.com> > > wrote: > > > > > Right. I attached the v26 as you asked. > > > > Thanks for posting the patches. > > > > While reviewing the patch, I noticed one rare case that it's possible that > > there > > are two table sync worker for the same table in the same time. > > > > The patch relies on LogicalRepWorkerLock to prevent concurrent access, but > > the > > apply worker will start a new worker after releasing the lock. So, at the > > point[1] > > where the lock is released and the new table sync worker has not been > > started, > > it seems possible that another old table sync worker will be reused for the > > same table. > > > > /* Now safe to release the LWLock */ > > LWLockRelease(LogicalRepWorkerLock); > > *[1] > > /* > > * If there are free sync worker slot(s), > > start a new sync > > * worker for the table. > > */ > > if (nsyncworkers < > > max_sync_workers_per_subscription) > > ... > > > > logicalrep_worker_launch(MyLogicalRepWorker->dbid, > > > > Yeah, this is a problem. I think one idea to solve this is by > extending the lock duration till we launch the tablesync worker but we > should also consider changing this locking scheme such that there is a > better way to indicate that for a particular rel, tablesync is in > progress. Currently, the code in TablesyncWorkerMain() also acquires > the lock in exclusive mode even though the tablesync for a rel is in > progress which I guess could easily heart us for larger values of > max_logical_replication_workers. So, that could be another motivation > to think for a different locking scheme. >
Yet another problem is that currently apply worker maintains a hash table for 'last_start_times' to avoid restarting the tablesync worker immediately upon error. The same functionality is missing while reusing the table sync worker. One possibility is to use a shared hash table to remember start times but I think it may depend on what we decide to solve the previous problem reported by Hou-San. -- With Regards, Amit Kapila.