On Fri, Apr 14, 2017 at 1:28 AM, Peter Eisentraut <peter.eisentr...@2ndquadrant.com> wrote: > On 4/10/17 13:28, Fujii Masao wrote: >> src/backend/replication/logical/launcher.c >> * Worker started and attached to our shmem. This check is safe >> * because only launcher ever starts the workers, so nobody can steal >> * the worker slot. >> >> The tablesync patch enabled even worker to start another worker. >> So the above assumption is not valid for now. >> >> This issue seems to cause the corner case where the launcher picks up >> the same worker slot that previously-started worker has already picked >> up to start another worker. > > I think what the comment should rather say is that workers are always > started through logicalrep_worker_launch() and worker slots are always > handed out while holding LogicalRepWorkerLock exclusively, so nobody can > steal the worker slot. > > Does that make sense?
No unless I'm missing something. logicalrep_worker_launch() picks up unused worker slot (slot's proc == NULL) while holding LogicalRepWorkerLock. But it releases the lock before the slot is marked as used (i.e., slot is set to non-NULL). Then newly-launched worker calls logicalrep_worker_attach() and marks the slot as used. So if another logicalrep_worker_launch() starts after LogicalRepWorkerLock is released before the slot is marked as used, it can pick up the same slot because that slot looks unused. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers