** Changed in: postgresql-common (Debian) Status: New => Fix Released
-- You received this bug notification because you are a member of Desktop Packages, which is subscribed to postgresql-common in Ubuntu. https://bugs.launchpad.net/bugs/1042556 Title: Critical data loss bug in postgresql-common initscript Status in “postgresql-common” package in Ubuntu: Fix Released Status in “postgresql-common” source package in Lucid: In Progress Status in “postgresql-common” source package in Precise: Fix Released Status in “postgresql-common” package in Debian: Fix Released Bug description: Hi The Debian packages for PostgreSQL (and thus the Ubuntu packages because of the shared use of pg_wrapper) are subject to a potentially critical data loss bug because of an unsafe procedure for restarting PostgreSQL. This issue has been recognised and patched in Debian: http://anonscm.debian.org/loggerhead/pkg-postgresql/postgresql-common/trunk/revision/1181 http://archives.postgresql.org/pgsql-general/2012-07/msg00501.php but should be urgently included in Ubuntu and backported. I quote Tom Lane (key PostgreSQL dev): [The] forced unlink on the postmaster.pid file [...] (a) is entirely unnecessary, and (b) defeats the safety interlock against starting a new postmaster before all the old backends have flushed out. It is VITAL that pg_wrapper NEVER unlink the postmaster.pid file. The postmaster will do that its self if it finds the pid to be stale, but only after performing some checks to make sure there are no backends still running and to ensure that there's no other postmaster running against the database. See: http://archives.postgresql.org/pgsql-general/2012-07/msg00475.php Context here: http://archives.postgresql.org/pgsql-general/2012-07/msg00350.php http://dba.stackexchange.com/questions/20959/recover-postgresql-database-from-wal-errors-on-startup/20961 SRU INFORMATION: * Impact: Severe data loss in rare corner cases. * Regression potential: Very low. The change has been in Debian, Quantal, and my very popular PostgreSQL backports repository for quite some time. pg_ctlcluster has a function start_check_pid_file() which cleans up a stale PID file on startup if it still exists after pg_ctlcluster stop --force goes to kill -9 the postmaster, so that does not stop a subsequent startup. The test suite (t/030_errors.t) explicitly covers scenarios with missing, broken, and stale PID files and ensures that they are handled properly. * Test case: I do not know a realistic and reliable test case to cause the data loss, but the analysis of the bug in above ML thread is very clear. I suggest to regression-test the change only, i. e. run the postgresql-common test suite and a manual check that starting a cluster still works with a stale pid file being around: sudo pg_createcluster 9.1 test --start sudo cp /var/lib/postgresql/9.1/test/postmaster.pid{,.save} sudo pg_ctlcluster 9.1 test stop # now cause a stale pid file sudo cp /var/lib/postgresql/9.1/test/postmaster.pid{.save,} # this should succeed and say "Removed stale pid file." sudo pg_ctlcluster 9.1 test start # this should say that 9.1/test is online pg_lsclusters To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/postgresql-common/+bug/1042556/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp