On Tue, 2009-10-27 at 00:50 -0500, Karen Pease wrote:
> > OK, so there's nothing shrieklingly obviously wrong with what the
> > postmaster is up to. But what about the backend that's stopped
> > responding? Try connecting gdb to that "postgres" process once it's
> > stopped responding and get a backtrace from that.
> > 
> 
> Okay -- I started up a psql instance, which immediately locks up.  I
> then attached gdb to it and got this:
> 
> (gdb) cont
> Continuing.
> ^C
> Program received signal SIGINT, Interrupt.
> 0x00fe2416 in __kernel_vsyscall ()

You didn't actually request a backtrace (bt), so all it shows is the top
stack frame. That doesn't tell us anything except that it's busy in a
system call in the kernel.

> > You can find out a bit more about what the kernel is doing using the
> > "magic" keyboard sequence "ALT-SysRQ-T" from a vconsole (not under X).
> 
> Nothing happened.  Nothing useful in dmesg -- certainly no stacktraces.

Your kernel might not have the "magic sysrq key" enabled. Run:

  sudo sysctl -w kernel.sysrq=1

and try again. Note that on some systems with weird keyboards you might
have to hold the "Fn" key (if you have one) or disable "F-Lock" (if you
have it) to get SysRq to be recognised. The print screen / PrtScn key is
usually shared with SysRq even if it's not marked as such.

Hmm, it looks like the SysRq magic key sequences even work under X. I
didn't think they did, but on my system here hitting alt-sysrq-t under
X11 dumps a bunch of task trace data in to /var/log/kern.log (Ubuntu
system), including task info and some general info on the CPU states.

Oct 27 14:20:19 wallace kernel: [13668207.501781] postgres      S 28e389e2     
0  3152  31105
Oct 27 14:20:19 wallace kernel: [13668207.501781]  f352bcac 00200086 c3634358 
28e389e2 00029346 00000000 c069a340 c07d2180
Oct 27 14:20:19 wallace kernel: [13668207.501781]  c51fe480 c51fe6f8 c3634300 
ffffffff c3634300 00000000 c3634300 c51fe6f8
Oct 27 14:20:19 wallace kernel: [13668207.501781]  f352bca8 c01398c8 c90e0000 
7fffffff c90e01a8 f352bcf8 c050ec7d f352bcc8
Oct 27 14:20:19 wallace kernel: [13668207.501781] Call Trace:
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c01398c8>] ? 
enqueue_task_fair+0x68/0x70
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c050ec7d>] 
schedule_timeout+0xad/0xe0
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c0156aea>] ? 
prepare_to_wait+0x3a/0x70
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c04aaf38>] 
unix_stream_data_wait+0x88/0xe0
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c0156890>] ? 
autoremove_wake_function+0x0/0x50
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c04ac631>] 
unix_stream_recvmsg+0x311/0x490
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c02b0990>] ? 
apparmor_socket_recvmsg+0x10/0x20
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c04356e4>] 
sock_recvmsg+0xf4/0x120
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c0156890>] ? 
autoremove_wake_function+0x0/0x50
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c01c3bb7>] ? 
__mem_cgroup_uncharge_common+0x137/0x170
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c01a39b8>] ? 
__dec_zone_page_state+0x18/0x20
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c01b0411>] ? 
page_remove_rmap+0x61/0x130
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c012eb00>] ? 
kunmap_atomic+0x50/0x60
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c043599c>] 
sys_recvfrom+0x7c/0xd0
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c019c305>] ? 
__pagevec_free+0x25/0x30
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c019eee0>] ? 
release_pages+0x160/0x1a0
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c02d2bfd>] ? 
rb_erase+0xcd/0x150
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c0435a26>] 
sys_recv+0x36/0x40
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c0435be7>] 
sys_socketcall+0x1b7/0x2b0
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c014617b>] ? 
sys_gettimeofday+0x2b/0x70
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c0109eef>] 
sysenter_do_call+0x12/0x2f


BTW, if you do get kernel task trace information and you decide to
redact it, it'd be good to include at least all your `postgres'
instances, all `httpd' instances, and the summary information at the
end.

--
Craig Ringer


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply via email to