Thanks for restarting this thread. At Tue, 9 Jan 2024 09:40:23 +0900, Michael Paquier <mich...@paquier.xyz> wrote in > On Fri, Jan 05, 2024 at 02:58:55PM -0500, Robert Haas wrote: > > I'm not a Windows expert, but my guess is that 0001 is a very good > > idea. I hope someone who is a Windows expert will comment on that. > > I am +1 on 0001. It is just something we've never anticipated when > these wrappers around cmd in pg_ctl were written.
Thanks for committing it. > > 0002 seems problematic to me. One potential issue is that it would > > break if someone renamed postgres.exe to something else -- although > > that's probably not really a serious problem. > > We do a find_other_exec_or_die() on "postgres" with what could be a > custom execution path. So we're actually sure that the binary will be > there in the start path, no? I don't like much the hardcoded > dependency to .exe here. The patch doesn't care of the path for postgres.exe. If you are referring to the code you cited below, it's for another reason. I'll describe that there. > > A bigger issue is that > > it seems like it would break if someone used pg_ctl to start several > > instances in different data directories on the same machine. If I'm > > understanding correctly, that currently mostly works, and this would > > break it. > > Not having the guarantee that a single shell_pid is associated to a > single postgres.exe would be a problem. Now the patch includes this > code: > + if (ppe.th32ParentProcessID == shell_pid && > + strcmp("postgres.exe", ppe.szExeFile) == 0) > + { > + if (pm_pid != ppe.th32ProcessID && pm_pid != 0) > + multiple_children = true; > + pm_pid = ppe.th32ProcessID; > + } > > Which is basically giving this guarantee? multiple_children should > never happen once the autorun part is removed. Is that right? The patch indeed ensures the relationship between the parent pg_ctl.exe and postgres.exe. However, for some reason, in my Windows 11 environment with the /D option specified, I observed that another cmd.exe is spawned as the second child process of the parent cmd.exe. This is why there is a need to verify the executable file name. I have no idea how the second cmd.exe is being spawned. > + * The launcher shell might start other cmd.exe instances or programs > + * besides postgres.exe. Veryfying the program file name is essential. > > With the autorun part of cmd.exe removed, what's still relevant here? Yes, if the strcmp() is commented out, multiple_children sometimes becomes true.. > s/Veryfying/Verifying/. Oops! > Perhaps 0002 should make more efforts in documenting things like > th32ProcessID and th32ParentProcessID. Is it correct to understand that you are requesting changes as follows? --- a/src/bin/pg_ctl/pg_ctl.c +++ b/src/bin/pg_ctl/pg_ctl.c @@ -1995,11 +1995,14 @@ pgwin32_find_postmaster_pid(pid_t shell_pid) * * Check for duplicate processes to ensure reliability. * - * The launcher shell might start other cmd.exe instances or programs - * besides postgres.exe. Verifying the program file name is essential. - * - * The launcher shell process isn't checked in this function. It will be - * checked by the caller. + * The ppe entry to be examined is identified by th32ParentProcessID, which + * should correspond to the cmd.exe process that executes the postgres.exe + * binary. Additionally, th32ProcessID in the same entry should be the PID + * of the launched postgres.exe. However, even though we have launched the + * parent cmd.exe with the /D option specified, it is sometimes observed + * that another cmd.exe is launched for unknown reasons. Therefore, it is + * crucial to verify the program file name to avoid returning the wrong + * PID. */ regards. -- Kyotaro Horiguchi NTT Open Source Software Center
>From f9f2486f18502357fb76f2f1da00e19db40b2bc9 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota....@gmail.com> Date: Tue, 24 Oct 2023 14:46:50 +0900 Subject: [PATCH v6 1/2] Improve pg_ctl postmaster process check on Windows Currently pg_ctl on Windows does not verify that it actually executed a postmaster process due to lack of process ID knowledge. This can lead to false positives in cases where another pg_ctl instance starts a different server simultaneously. This patch adds the capability to identify the process ID of the launched postmaster on Windows, similar to other OS versions, ensuring more reliable detection of concurrent server startups. --- src/bin/pg_ctl/pg_ctl.c | 105 ++++++++++++++++++++++++++++++++++++---- 1 file changed, 96 insertions(+), 9 deletions(-) diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c index 6900b27675..dec0afdb29 100644 --- a/src/bin/pg_ctl/pg_ctl.c +++ b/src/bin/pg_ctl/pg_ctl.c @@ -131,6 +131,7 @@ static void adjust_data_dir(void); #ifdef WIN32 #include <versionhelpers.h> +#include <tlhelp32.h> static bool pgwin32_IsInstalled(SC_HANDLE); static char *pgwin32_CommandLine(bool); static void pgwin32_doRegister(void); @@ -141,6 +142,7 @@ static void WINAPI pgwin32_ServiceMain(DWORD, LPTSTR *); static void pgwin32_doRunAsService(void); static int CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_service); static PTOKEN_PRIVILEGES GetPrivilegesToDelete(HANDLE hToken); +static pid_t pgwin32_find_postmaster_pid(pid_t shell_pid); #endif static pid_t get_pgpid(bool is_status_request); @@ -608,7 +610,11 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint) /* File is complete enough for us, parse it */ pid_t pmpid; time_t pmstart; - +#ifndef WIN32 + pid_t wait_pid = pm_pid; +#else + pid_t wait_pid = pgwin32_find_postmaster_pid(pm_pid); +#endif /* * Make sanity checks. If it's for the wrong PID, or the recorded * start time is before pg_ctl started, then either we are looking @@ -618,14 +624,8 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint) */ pmpid = atol(optlines[LOCK_FILE_LINE_PID - 1]); pmstart = atol(optlines[LOCK_FILE_LINE_START_TIME - 1]); - if (pmstart >= start_time - 2 && -#ifndef WIN32 - pmpid == pm_pid -#else - /* Windows can only reject standalone-backend PIDs */ - pmpid > 0 -#endif - ) + + if (pmstart >= start_time - 2 && pmpid == wait_pid) { /* * OK, seems to be a valid pidfile from our child. Check the @@ -1949,6 +1949,93 @@ GetPrivilegesToDelete(HANDLE hToken) return tokenPrivs; } + +/* + * Find the PID of the launched postmaster. + * + * On Windows, the cmd.exe doesn't support the exec command. As a result, we + * don't directly get the postmaster's PID. This function identifies the PID of + * the postmaster started by the child cmd.exe. + * + * Returns the postmaster's PID. If the shell is alive but the postmaster is + * missing, returns 0. Otherwise terminates this command with an error. + * + * This function uses PID 0 as an invalid value, assuming the system idle + * process occupies it and it won't be a PID for a shell or postmaster. + */ +static pid_t +pgwin32_find_postmaster_pid(pid_t shell_pid) +{ + HANDLE hSnapshot; + PROCESSENTRY32 ppe; + pid_t pm_pid = 0; /* abusing 0 as an invalid value */ + bool multiple_children = false; + DWORD last_error; + + /* create a process snapshot */ + hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0); + if (hSnapshot == INVALID_HANDLE_VALUE) + { + write_stderr(_("%s: CreateToolhelp32Snapshot failed: error code %lu\n"), + progname, (unsigned long) GetLastError()); + exit(1); + } + + /* start iterating on the snapshot */ + ppe.dwSize = sizeof(PROCESSENTRY32); + if (!Process32First(hSnapshot, &ppe)) + { + write_stderr(_("%s: Process32First failed: error code %lu\n"), + progname, (unsigned long) GetLastError()); + exit(1); + } + + /* + * Iterate over the snapshot + * + * Check for duplicate processes to ensure reliability. + * + * The ppe entry to be examined is identified by th32ParentProcessID, which + * should correspond to the cmd.exe process that executes the postgres.exe + * binary. Additionally, th32ProcessID in the same entry should be the PID + * of the launched postgres.exe. However, even though we have launched the + * parent cmd.exe with the /D option specified, it is sometimes observed + * that another cmd.exe is launched for unknown reasons. Therefore, it is + * crucial to verify the program file name to avoid returning the wrong + * PID. + */ + do + { + if (ppe.th32ParentProcessID == shell_pid && + strcmp("postgres.exe", ppe.szExeFile) == 0) + { + if (pm_pid != ppe.th32ProcessID && pm_pid != 0) + multiple_children = true; + pm_pid = ppe.th32ProcessID; + } + } + while (Process32Next(hSnapshot, &ppe)); + + /* avoid multiple calls primary for clarity, not out of necessity */ + last_error = GetLastError(); + if (last_error != ERROR_NO_MORE_FILES) + { + write_stderr(_("%s: Process32Next failed: error code %lu\n"), + progname, (unsigned long) last_error); + exit(1); + } + CloseHandle(hSnapshot); + + /* assuming the launching shell executes a single process */ + if (multiple_children) + { + write_stderr(_("%s: multiple postmasters found\n"), + progname); + exit(1); + } + + return pm_pid; +} #endif /* WIN32 */ static void -- 2.39.3
>From b453e1f2d1e2fefbee93c4573e6ffcf8897bf319 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota....@gmail.com> Date: Tue, 24 Oct 2023 14:47:24 +0900 Subject: [PATCH v6 2/2] Remove short sleep from 001_start_stop.pl Previous commits ensures reliable detection whether the postmaster has really started or not, so the short sleep in a test is not needed anymore. It was originally introduced by 6bcce258. --- src/bin/pg_ctl/t/001_start_stop.pl | 5 ----- 1 file changed, 5 deletions(-) diff --git a/src/bin/pg_ctl/t/001_start_stop.pl b/src/bin/pg_ctl/t/001_start_stop.pl index fd56bf7706..b50bd30a12 100644 --- a/src/bin/pg_ctl/t/001_start_stop.pl +++ b/src/bin/pg_ctl/t/001_start_stop.pl @@ -46,11 +46,6 @@ my $ctlcmd = [ ]; command_like($ctlcmd, qr/done.*server started/s, 'pg_ctl start'); -# sleep here is because Windows builds can't check postmaster.pid exactly, -# so they may mistake a pre-existing postmaster.pid for one created by the -# postmaster they start. Waiting more than the 2 seconds slop time allowed -# by wait_for_postmaster() prevents that mistake. -sleep 3 if ($windows_os); command_fails([ 'pg_ctl', 'start', '-D', "$tempdir/data" ], 'second pg_ctl start fails'); command_ok([ 'pg_ctl', 'stop', '-D', "$tempdir/data" ], 'pg_ctl stop'); -- 2.39.3