[Desktop-packages] [Bug 1042556] Re: Critical data loss bug in postgresql-common initscript

Bug Watch Updater Thu, 04 Oct 2012 02:36:34 -0700

** Changed in: postgresql-common (Debian)
       Status: New => Fix Released


-- 
You received this bug notification because you are a member of Desktop
Packages, which is subscribed to postgresql-common in Ubuntu.
https://bugs.launchpad.net/bugs/1042556

Title:
  Critical data loss bug in postgresql-common initscript

Status in “postgresql-common” package in Ubuntu:
  Fix Released
Status in “postgresql-common” source package in Lucid:
  In Progress
Status in “postgresql-common” source package in Precise:
  Fix Released
Status in “postgresql-common” package in Debian:
  Fix Released

Bug description:
  Hi

  The Debian packages for PostgreSQL (and thus the Ubuntu packages
  because of the shared use of pg_wrapper) are subject to a potentially
  critical data loss bug because of an unsafe procedure for restarting
  PostgreSQL.

  This issue has been recognised and patched in Debian:

      
http://anonscm.debian.org/loggerhead/pkg-postgresql/postgresql-common/trunk/revision/1181
      http://archives.postgresql.org/pgsql-general/2012-07/msg00501.php

  but should be urgently included in Ubuntu and backported.

  I quote Tom Lane (key PostgreSQL dev):

          [The] forced unlink on the postmaster.pid file [...] (a) is entirely
          unnecessary, and (b) defeats the safety interlock against starting a
          new postmaster before all the old backends have flushed out.

  It is VITAL that pg_wrapper NEVER unlink the postmaster.pid file. The
  postmaster will do that its self if it finds the pid to be stale, but
  only after performing some checks to make sure there are no backends
  still running and to ensure that there's no other postmaster running
  against the database.

  See:
      http://archives.postgresql.org/pgsql-general/2012-07/msg00475.php

  Context here:

      http://archives.postgresql.org/pgsql-general/2012-07/msg00350.php
      
http://dba.stackexchange.com/questions/20959/recover-postgresql-database-from-wal-errors-on-startup/20961

  SRU INFORMATION:
   * Impact: Severe data loss in rare corner cases.

   * Regression potential: Very low. The change has been in Debian,
  Quantal, and my very popular PostgreSQL backports repository for quite
  some time. pg_ctlcluster has a function start_check_pid_file() which
  cleans up a stale PID file on startup if it still exists after
  pg_ctlcluster stop --force goes to kill -9 the postmaster, so that
  does not stop a subsequent startup. The test suite (t/030_errors.t)
  explicitly covers scenarios with missing, broken, and stale PID files
  and ensures that they are handled properly.

   * Test case: I do not know a realistic and reliable test case to
  cause the data loss, but the analysis of the bug in above ML thread is
  very clear. I suggest to regression-test the change only, i. e. run
  the postgresql-common test suite and a manual check that starting a
  cluster still works with a stale pid file being around:

    sudo pg_createcluster 9.1 test --start
    sudo cp /var/lib/postgresql/9.1/test/postmaster.pid{,.save}
    sudo pg_ctlcluster 9.1 test stop
    # now cause a stale pid file
    sudo cp /var/lib/postgresql/9.1/test/postmaster.pid{.save,}
    
    # this should succeed and say "Removed stale pid file."
    sudo pg_ctlcluster 9.1 test start
    
    # this should say that 9.1/test is online
    pg_lsclusters

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/postgresql-common/+bug/1042556/+subscriptions

-- 
Mailing list: https://launchpad.net/~desktop-packages
Post to     : desktop-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~desktop-packages
More help   : https://help.launchpad.net/ListHelp

[Desktop-packages] [Bug 1042556] Re: Critical data loss bug in postgresql-common initscript

Reply via email to