On Thu, Jul 27, 2023 at 07:23:32PM -0700, Andres Freund wrote:
> As written, dynamic and static workers race each other. It doesn't make a lot
> of sense to me to use the same ids for either?
> 
> The attached patch reproduces the problem on master.
> 
> Note that without the sleep(3) in the test the workers don't actually finish
> starting, the test shuts down the cluster before that happens...

So you have faced a race condition where the commit of the transaction
doing the schema creation for the static workers is delayed long
enough that the dynamic workers don't see it, and bumped on a catalog
conflict when they try to create the same schemas.

Having each bgworker on its own schema would be enough to prevent
conflicts, but I'd like to add a second thing: a check on
pg_stat_activity.wait_event after starting the workers.  I have added
something like that in the patch I have posted today for the custom
wait events at [1] and it enforces the startup sequences of the
workers in a stricter way.

Does the attached take care of your issue?

[1]: https://www.postgresql.org/message-id/zmmuir7kvzpwe...@paquier.xyz
--
Michael
diff --git a/src/test/modules/worker_spi/t/001_worker_spi.pl b/src/test/modules/worker_spi/t/001_worker_spi.pl
index c293871313..f370ba93b8 100644
--- a/src/test/modules/worker_spi/t/001_worker_spi.pl
+++ b/src/test/modules/worker_spi/t/001_worker_spi.pl
@@ -58,22 +58,24 @@ $node->restart;
 # Check that bgworkers have been registered and launched.
 ok( $node->poll_query_until(
 		'mydb',
-		qq[SELECT datname, count(datname) FROM pg_stat_activity
-            WHERE backend_type = 'worker_spi' GROUP BY datname;],
-		'mydb|3'),
+		qq[SELECT datname, wait_event, count(datname) FROM pg_stat_activity
+            WHERE backend_type = 'worker_spi' GROUP BY datname, wait_event;],
+		'mydb|Extension|3'),
 	'bgworkers all launched'
 ) or die "Timed out while waiting for bgworkers to be launched";
 
 # Ask worker_spi to launch dynamic bgworkers with the library loaded, then
-# check their existence.
-my $worker1_pid = $node->safe_psql('mydb', 'SELECT worker_spi_launch(1);');
-my $worker2_pid = $node->safe_psql('mydb', 'SELECT worker_spi_launch(2);');
+# check their existence.  Use IDs that do not overlap with the schemas created
+# by the previous workers.
+my $worker1_pid = $node->safe_psql('mydb', 'SELECT worker_spi_launch(10);');
+my $worker2_pid = $node->safe_psql('mydb', 'SELECT worker_spi_launch(11);');
+
 ok( $node->poll_query_until(
 		'mydb',
-		qq[SELECT datname, count(datname)  FROM pg_stat_activity
+		qq[SELECT datname, wait_event, count(datname)  FROM pg_stat_activity
             WHERE backend_type = 'worker_spi dynamic' AND
-            pid IN ($worker1_pid, $worker2_pid) GROUP BY datname;],
-		'mydb|2'),
+            pid IN ($worker1_pid, $worker2_pid) GROUP BY datname, wait_event;],
+		'mydb|Extension|2'),
 	'dynamic bgworkers all launched'
 ) or die "Timed out while waiting for dynamic bgworkers to be launched";
 

Attachment: signature.asc
Description: PGP signature

Reply via email to