On Sat, Jun 20, 2015 at 12:07 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Michael Paquier <michael.paqu...@gmail.com> writes: >> Now if we look at RewindTest.pm, there is the following code: >> if ($test_master_datadir) >> { >> system >> "pg_ctl -D $test_master_datadir -s -m immediate stop >> 2> /dev/null"; >> } >> if ($test_standby_datadir) >> { >> system >> "pg_ctl -D $test_standby_datadir -s -m immediate >> stop 2> /dev/null"; >> } >> And I think that the problem is triggered because we are missing a -w >> switch here, meaning that we do not wait until the confirmation that >> the server has stopped, and visibly if stop is slow enough the next >> server to use cannot start because the port is already taken by the >> server currently stopping. > > After I woke up a bit more, I remembered that -w is already the default > for "pg_ctl stop", so your diagnosis here is incorrect.
Ah right. I forgot that. Perhaps I got just lucky in my runs. > I suspect that the real problem is the arbitrary decision to use -m > immediate. The postmaster would ordinarily wait for its children to > die, but on a slow machine we could perhaps reach the end of that > 5-second timeout, whereupon the postmaster would SIGKILL its children > *and exit immediately*. I'm not sure how instantaneous SIGKILL is, > but it seems possible that we could end up trying to start the new > postmaster before all the children of the old one are dead. If the > shmem interlock is working properly that ought to fail. > > I wonder whether it's such a good idea for the postmaster to give > up waiting before all children are gone (postmaster.c:1722 in HEAD). I don't think so as well. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers