Hi all, I started setting up some halfway automated method of simulating hard crashes and even while setting those up I found some pretty unsettling results... Now its not unlikely that my testing is flawed but unfortunately I don't see where right now (its 3am now and I have a 8h trainride behind me, so ...)
The simple testsetup I have till now: Serverscript: * setup disk * start pg * wait for getting killed * setup disk * start pg Clientside: * CREATE DATABASE ... TEMPLATE crashtemplate * CHECKPOINT * make device readonly not allowing any cache flushes or such (using devicemapper) kill server * connect to database (some of the time it errors here * select * from $every_table (some time here) At first pg survived that nicely without any problems. Then I got to my senses and started adding some background io. Like: dd if=/dev/zero of=/mnt/test/foobar bs=10M count=1000 Thats where things started failing. All are logs from after the crash: 1: FATAL: could not read relation mapping file "base/140883/pg_filenode.map": Interrupted system call DEBUG: autovacuum: processing database "postgres" FATAL: could not read relation mapping file "base/140883/pg_filenode.map": Success DEBUG: autovacuum: processing database "postgres" ... FATAL: could not read relation mapping file "base/58963/pg_filenode.map": No such file or directory 2: FATAL: "base/165459" is not a valid data directory DETAIL: File "base/165459/PG_VERSION" does not contain valid data. HINT: You might need to initdb. 3: You are now connected to database "test". test=# SELECT execute('SELECT * FROM table_'||g.i) FROM generate_series(1, 3000) g(i); ERROR: XX001: could not read block 0 in file "base/124499/11652": read only 0 of 8192 bytes LOCATION: mdread, md.c:656 (that one I did not see with -o data=ordered,barrier=1,commit=300) I tried the following mount options/filesystems so far: -t ext4 -o data=writeback,barrier=1,commit=300,noauto_da_alloc -t ext4 -o data=writeback,barrier=1,commit=300 -t ext4 -o data=writeback,barrier=0,commit=300 -t ext4 -o data=ordered,barrier=0,commit=300,noauto_da_alloc -t ext4 -o data=ordered,barrier=1,commit=300,noauto_da_alloc -t ext4 -o data=ordered,barrier=1,commit=300 The same with s/ext4/ext3/ and with a commit=5. With the latter the errors were way much harder to reproduce (not that surprisingly) but still occured. I attached my preliminary scripts/hacks... They even contain a comment or two. Note though that they are a bit of a loaded gun... I guess it would be sensible trying to do some more extensive tests on a setup like that... All I tested till now was create database :-( Andres
pg_crashtest_client.sh
Description: application/shellscript
pg_crashtest_server.sh
Description: application/shellscript
pg_createtemplate.sh
Description: application/shellscript
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers