Yesterday, while attempting to access a database, I received errors saying that the database was innaccessible. After investigating a little, I found the following in the PostgreSQL log files:
2004-06-30 08:30:19 [24119] LOG: checkpoint process (PID 28423) was terminated by signal 11 2004-06-30 08:30:19 [24119] LOG: terminating any other active server processes 2004-06-30 08:30:19 [28383] WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the curre nt transaction and exit, because another server process exited abnormally and po ssibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat yo ur command. 2004-06-30 08:30:19 [28362] WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the curre nt transaction and exit, because another server process exited abnormally and po ssibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat yo ur command. The last bit then repeated a few more times, and then: 2004-06-30 08:30:20 [24119] LOG: all server processes terminated; reinitializing 2004-06-30 08:30:20 [28424] LOG: database system was interrupted at 2004-06-30 08:22:23 CDT 2004-06-30 08:30:20 [28424] LOG: checkpoint record is at 8/77703F9C 2004-06-30 08:30:20 [28424] LOG: redo record is at 8/775B1D38; undo record is at 0/0; shutdown FALSE 2004-06-30 08:30:20 [28424] LOG: next transaction ID: 1638554; next OID: 1058492 2004-06-30 08:30:20 [28424] LOG: database system was not properly shut down; automatic recovery in progress 2004-06-30 08:30:20 [28424] LOG: redo starts at 8/775B1D38 2004-06-30 08:30:21 [28430] LOG: connection received: host=[local] port= 2004-06-30 08:30:21 [28430] FATAL: the database system is starting up 2004-06-30 08:30:38 [28424] LOG: record with zero length at 8/78855F38 2004-06-30 08:30:38 [28424] LOG: redo done at 8/78853EE0 2004-06-30 08:31:40 [28449] LOG: connection received: host=[local] port= 2004-06-30 08:31:40 [28449] FATAL: the database system is starting up 2004-06-30 08:31:48 [28452] LOG: connection received: host=[local] port= 2004-06-30 08:31:48 [28452] FATAL: the database system is starting up 2004-06-30 08:31:53 [28459] LOG: connection received: host=[local] port= 2004-06-30 08:31:53 [28459] FATAL: the database system is starting up And this then continues on and on. Even 20 minutes later, attempts to connect to the database were met with the same FATAL error. Eventually I attempted to shut it down and restart it, however that failed too. When I attempted to shut it down, I discovered a hung 'startup subprocess' that can't be killed. nexus:~# ps aux | grep postgres postgres 28424 0.0 1.5 16804 3044 pts/313 D 08:35 0:06 postgres: startup subprocess nexus:~# kill -9 28424 nexus:~# ps aux | grep postgres postgres 28424 0.0 1.5 16804 3044 pts/313 D 08:35 0:06 postgres: startup subprocess nexus:~# As soon as I can get physical access to the machine, I'm planning to reboot it, as I can't think of anything else to do to kill a process that can't be kill -KILL'ed. I'm worried that attempting to start the database after rebooting will fail in the same way, however. Has anyone seen anything like this before, or have any ideas on how to proceed? I'm running on an Intel Pentium Pro box, with Debian/GNU Linux, running 'unstable'. I'm using PostgreSQL 7.4.3. Thank you for your help. -- | Christopher +------------------------------------------------+ | Here I stand. I can do no other. | +------------------------------------------------+ ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])