07.09.2024 21:11, Noah Misch wrote:
Noah, what do you think of handling this error in line with handling of ERROR_BROKEN_PIPE and ERROR_BAD_PIPE (which was done in 0ea1f2a3a)? I tried the following change: switch (GetLastError()) { case ERROR_BROKEN_PIPE: case ERROR_BAD_PIPE: + case ERROR_PIPE_BUSY: and saw no issues.That would be a strict improvement over returning EINVAL like we do today. We do use PIPE_UNLIMITED_INSTANCES, so I expect the causes of ERROR_PIPE_BUSY are process exit and ENOMEM-like situations. While that change is the best thing if the process is exiting, it could silently drop the signal in ENOMEM-like situations. Consider the following alternative. If sig==0, just return 0 like you propose, because the process isn't completely gone. Otherwise, sleep and retry the signal, like pgwin32_open_handle() retries after certain errors. What do you think of that?
Thank you for your attention to the issue! I agree with your approach. It looks like Microsoft recommends to loop on ERROR_PIPE_BUSY: [1] (they say "Calling CallNamedPipe is equivalent to calling the CreateFile ..." at [2]). So if we aim to not only fix "pg_ctl stop", but to make pgkill() robust, it's the way to go, IMHO. I'm not sure about an infinite loop they show, I'd vote for a loop with the same characteristics as in pgwin32_open_handle(). I've managed to trigger ERROR_PIPE_BUSY with "pg_ctl reload", when running 20 TAP tests (see attached) in parallel (with 20 threads executing $node->reload() in each), and with the kill() call inside do_reload() repeated x100 (see the modification patch attached too): ... # Running: pg_ctl -D .../099_test_pgkill\data/t_099_test_pgkill_node_data/pgdata reload ### Reloading node "node" # Running: pg_ctl -D .../099_test_pgkill\data/t_099_test_pgkill_node_data/pgdata reload [13:41:46.850](2.400s) # 18 server signaled server signaled server signaled server signaled server signaled server signaled server signaled server signaled !!!pgkill| GetLastError(): 231 pg_ctl: could not send reload signal (PID: 3912), iteration: 81, res: -1, errno: 22: Invalid argument server signaled [13:41:52.594](5.744s) # 19 ... [1] https://learn.microsoft.com/en-us/windows/win32/ipc/named-pipe-client [2] https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-callnamedpipea Best regards, Alexander
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c index e7e878c22f..cab6eb9472 100644 --- a/src/bin/pg_ctl/pg_ctl.c +++ b/src/bin/pg_ctl/pg_ctl.c @@ -1167,12 +1174,17 @@ do_reload(void) exit(1); } - if (kill(pid, sig) != 0) +for (int i = 1; i <= 100; i++) +{ +int res, en; + if ((res = kill(pid, sig)) != 0) { - write_stderr(_("%s: could not send reload signal (PID: %d): %m\n"), - progname, (int) pid); +en = errno; + write_stderr(_("%s: could not send reload signal (PID: %d), iteration: %d, res: %d, errno: %d: %m\n"), + progname, (int) pid, i, res, en); exit(1); } +} print_msg(_("server signaled\n")); } diff --git a/src/port/kill.c b/src/port/kill.c index 412c2f19c1..b3d2c00796 100644 --- a/src/port/kill.c +++ b/src/port/kill.c @@ -70,6 +70,7 @@ pgkill(int pid, int sig) return 0; } +int le = GetLastError(); switch (GetLastError()) { case ERROR_BROKEN_PIPE: @@ -89,6 +90,7 @@ pgkill(int pid, int sig) errno = EPERM; return -1; default: +fprintf(stderr, "!!!pgkill| GetLastError(): %d\n", le); errno = EINVAL; /* unexpected */ return -1; }
099_test_pgkill.pl
Description: Perl program