Re: Yet another way for pg_ctl stop to fail on Windows

Noah Misch Sat, 07 Sep 2024 11:12:01 -0700

On Sat, Sep 07, 2024 at 03:00:00PM +0300, Alexander Lakhin wrote:
> With extra logging added, I got:
> ### Stopping node "CIC_2PC_test" using mode fast
> # Running: pg_ctl -D 
> C:\src\postgresql\build/testrun/amcheck_3/003_cic_2pc\data/t_003_cic_2pc_CIC_2PC_test_data/pgdata
> -m fast stop
> waiting for server to shut down......!!!pgkill| GetLastError(): 231
> postmaster (9596) died untimely? res: -1, errno: 22
>  failed
> 
> Thus, CallNamedPipe() in pgkill() returned ERROR_PIPE_BUSY (All pipe
> instances are busy) and it was handled as an unexpected error.
> (The error code 231 returned 10 times out of 10 failures of this ilk for
> me.)


Thanks for discovering that.

> Noah, what do you think of handling this error in line with handling of
> ERROR_BROKEN_PIPE and ERROR_BAD_PIPE (which was done in 0ea1f2a3a)?
> 
> I tried the following change:
>         switch (GetLastError())
>         {
>                 case ERROR_BROKEN_PIPE:
>                 case ERROR_BAD_PIPE:
> +               case ERROR_PIPE_BUSY:
> and saw no issues.

That would be a strict improvement over returning EINVAL like we do today.  We
do use PIPE_UNLIMITED_INSTANCES, so I expect the causes of ERROR_PIPE_BUSY are
process exit and ENOMEM-like situations.  While that change is the best thing
if the process is exiting, it could silently drop the signal in ENOMEM-like
situations.  Consider the following alternative.  If sig==0, just return 0
like you propose, because the process isn't completely gone.  Otherwise, sleep
and retry the signal, like pgwin32_open_handle() retries after certain errors.
What do you think of that?

Re: Yet another way for pg_ctl stop to fail on Windows

Reply via email to