At Sun, 31 Dec 2023 20:07:41 +0900 (JST), Kyotaro Horiguchi <horikyota....@gmail.com> wrote in > We've noticed that when walreceiver is waiting for a connection to > complete, standby does not immediately respond to promotion > requests. In PG14, upon receiving a promotion request, walreceiver > terminates instantly, but in PG16, it waits for connection > timeout. This behavior is attributed to commit 728f86fec65, where a > part of libpqrcv_connect was simply replaced with a call to > libpqsrc_connect_params. This behavior can be verified by simply > dropping packets from the standby to the primary.
Apologize for the inconvenience on my part, but I need to fix this behavior. To continue this discussion, I'm providing a repro script here. With the script, the standby is expected to promote immediately, emitting the following log lines: standby.log: > 2024-01-18 16:25:22.245 JST [31849] LOG: received promote request > 2024-01-18 16:25:22.245 JST [31850] FATAL: terminating walreceiver process > due to administrator command > 2024-01-18 16:25:22.246 JST [31849] LOG: redo is not required > 2024-01-18 16:25:22.246 JST [31849] LOG: selected new timeline ID: 2 > 2024-01-18 16:25:22.274 JST [31849] LOG: archive recovery complete > 2024-01-18 16:25:22.275 JST [31847] LOG: checkpoint starting: force > 2024-01-18 16:25:22.277 JST [31846] LOG: database system is ready to accept > connections > 2024-01-18 16:25:22.280 JST [31847] LOG: checkpoint complete: wrote 3 > buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, > sync=0.001 s, total=0.005 s; sync files=2, longest=0.001 s, average=0.001 s; > distance=0 kB, estimate=0 kB; lsn=0/1548E98, redo lsn=0/1548E40 > 2024-01-18 16:25:22.356 JST [31846] LOG: received immediate shutdown request > 2024-01-18 16:25:22.361 JST [31846] LOG: database system is shut down After 728f86fec65 was introduced, promotion does not complete with the same operation, as follows. The patch attached to the previous mail fixes this behavior to the old behavior above. > 2024-01-18 16:47:53.314 JST [34515] LOG: received promote request > 2024-01-18 16:48:03.947 JST [34512] LOG: received immediate shutdown request > 2024-01-18 16:48:03.952 JST [34512] LOG: database system is shut down The attached script requires that sudo is executable. And there's another point to note. The script attempts to establish a replication connection to $primary_address:$primary_port. To packet-filter can work, it must be a remote address that is accessible when no packet-filter setting is set up. The firewall-cmd setting, need to be configured to block this connection. If simply an inaccessible IP address is set, the process will fail immediately with a "No route to host" error before the first packet is sent out, and it will not be blocked as intended. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
#! /bin/perl use Cwd; # This IP address must be a valid and accessible remote address, # otherwise replication connection will immediately fail with 'No # route to host', while we want to wait for replication connection to # complete. $primary_addr = '192.168.56.1'; $primary_port = 5432; $fwzone = 'public'; $rootdir = cwd(); $standby_dir = "$rootdir/standby"; $standby_port = 5432; $standby_logfile= "standby.log"; $rich_rule = "'rule family=\"ipv4\" destination address=\"$primary_addr\" port port=\"$primary_port\" protocol=\"tcp\" drop'"; $add_cmd = "sudo firewall-cmd --zone=$fwzone --add-rich-rule=$rich_rule"; $del_cmd = "sudo firewall-cmd --zone=$fwzone --remove-rich-rule=$rich_rule"; # Remove old entities system('killall -9 postgres'); system("rm -rf $standby_dir $standby_logfile"); # Setup packet drop system($add_cmd) == 0 or die "failed to exec \"$add_cmd\": $!\n"; # Setup a standby. system("initdb -D $standby_dir -c primary_conninfo='host=$primary_addr port=$primary_port'"); system("touch $standby_dir/standby.signal"); # Start it. system("pg_ctl start -D $standby_dir -o \"-p $standby_port\" -l standby.log"); sleep 1; # Try promoting standby, waiting for 10 seconds. system("pg_ctl promote -t 10 -D $standby_dir"); # Stop servers system("pg_ctl stop -m i -D $standby_dir"); # Remove packet-drop setting system($del_cmd) == 0 or die "failed to exec \"$del_cmd\": $!\n";