On 26/10/2009 5:28 PM, Karen Pease wrote: > I did my best to follow the gdb instructions. I ran: > > gdb -p 2852 > > Then connected entered the logging statements, then ran "cont", then > ctrl-c'ed it a couple times. I got:
OK, so there's nothing shrieklingly obviously wrong with what the postmaster is up to. But what about the backend that's stopped responding? Try connecting gdb to that "postgres" process once it's stopped responding and get a backtrace from that. > [r...@chmmr dbscripts]# ps ax -o pid,ppid,stat,wchan:50,cmd | grep -i > http > 3376 1 D > start_this_handle /usr/sbin/httpd start_this_handle appears in common ext4 call paths, and several lkml issue reports over time: http://lkml.org/lkml/2009/3/11/253 http://www.google.com.au/search?q=%22start_this_handle%22+ext4 Smells like kernel bug. When looking at two extremely stable pieces of software (Pg and apache) both having issues on a well tested kernel (Linux) with a new and fairly immature file system in use (ext4) it's probably not an unreasonable assumption. You can find out a bit more about what the kernel is doing using the "magic" keyboard sequence "ALT-SysRQ-T" from a vconsole (not under X). If the results scroll past too fast you can page through them with "less" on /var/log/kern.log (or /var/log/dmesg depending on your distro) or using the "dmesg" command. I won't be too surprised if you see a kernel stack trace for your httpd process(es) starting something like this: schedule+0x18/0x40 start_this_handle+0x374/0x508 jbd2_journal_start+0xbc/0x11c ext4_journal_start_sb+0x5c/0x84 ext4_dirty_inode+0xd4/0xf0 -- Craig Ringer -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs