Antonin Houska <a...@cybertec.at> wrote: > While looking at postmaster.c:reaper(), one problematic case occurred to me. > > > 1. Startup process signals PMSIGNAL_RECOVERY_STARTED. > > 2. Checkpointer process is forked and immediately dies. > > 3. reaper() catches this failure, calls HandleChildCrash() and thus sets > FatalError to true. > > 4. Startup process exits with non-zero status code too - either due to SIGQUIT > received from HandleChildCrash or due to some other failure of the startup > process itself. However, FatalError is already set, because of the previous > crash of the checkpointer. Thus reaper() does not set RecoveryError. > > 5. As RecoverError failed to be set to true, postmaster will try to restart > the cluster, although it apparently should not.
More common case occurred to me as soon as I sent the previous mail: any process of standby cluster has died. Thus the proposed fix would make restart_after_crash (GUC) completely ineffective for standbys. I'm not sure if that's desired. Question is whether RecoveryError should reflect problems during any kind of recovery, or just during crash recovery. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers