I did my best to follow the gdb instructions.  I ran:

gdb -p 2852

Then connected entered the logging statements, then ran "cont", then
ctrl-c'ed it a couple times.  I got:

Program received signal SIGINT, Interrupt.
0x001e6416 in __kernel_vsyscall ()
(gdb) bt
#0  0x001e6416 in __kernel_vsyscall ()
#1  0x00c7939d in ___newselect_nocancel () from /lib/libc.so.6
#2  0x081dbaf9 in ?? ()
#3  0x081dd20a in PostmasterMain ()
#4  0x08190f96 in main ()
(gdb) cont
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x001e6416 in __kernel_vsyscall ()
(gdb) bt
#0  0x001e6416 in __kernel_vsyscall ()
#1  0x00c7939d in ___newselect_nocancel () from /lib/libc.so.6
#2  0x081dbaf9 in ?? ()
#3  0x081dd20a in PostmasterMain ()
#4  0x08190f96 in main ()
(gdb) quit

The jammed httpd processes, by your commandline, are:

[r...@chmmr dbscripts]# ps ax -o pid,ppid,stat,wchan:50,cmd | grep -i
http
 3376     1 D
start_this_handle                                  /usr/sbin/httpd
 3379     1 D
start_this_handle                                  /usr/sbin/httpd
 3381     1 D
start_this_handle                                  /usr/sbin/httpd
 4147     1 D
start_this_handle                                  /usr/sbin/httpd
 4539     1 D
start_this_handle                                  /usr/sbin/httpd
 5484     1 D
start_this_handle                                  /usr/sbin/httpd
11100     1 D
start_this_handle                                  /usr/sbin/httpd
14882     1 D
start_this_handle                                  /usr/sbin/httpd

These cannot be killed by kill -9.  Example:

[r...@chmmr dbscripts]# kill -9 3376
[r...@chmmr dbscripts]# ps ax -o pid,ppid,stat,wchan:50,cmd | grep -i
http
 3376     1 D
start_this_handle                                  /usr/sbin/httpd
 3379     1 D
start_this_handle                                  /usr/sbin/httpd
 3381     1 D
start_this_handle                                  /usr/sbin/httpd
 4147     1 D
start_this_handle                                  /usr/sbin/httpd
 4539     1 D
start_this_handle                                  /usr/sbin/httpd
 5484     1 D
start_this_handle                                  /usr/sbin/httpd
11100     1 D
start_this_handle                                  /usr/sbin/httpd
14882     1 D
start_this_handle                                  /usr/sbin/httpd

As mentioned, I can kill postmaster.  But I can't restart it without a
reboot; it hangs:

[r...@chmmr dbscripts]# ps -ef | grep -i postm
postgres  2852     1  0 Oct25 ?        00:00:00 /usr/bin/postmaster -p
5432 -D /var/lib/pgsql/data
root     15115 14844  0 04:23 pts/0    00:00:00 grep -i postm
[r...@chmmr dbscripts]# /etc/init.d/postgresql stop
Stopping postgresql service: ^C^C                          [FAILED]
[r...@chmmr dbscripts]#
[r...@chmmr dbscripts]# killall -9 postmaster
[r...@chmmr dbscripts]# ps -ef | grep -i postm
root     15183 14844  0 04:24 pts/0    00:00:00 grep -i postm
[r...@chmmr dbscripts]# /etc/init.d/postgresql restart
Stopping postgresql service: ^C^C                          [FAILED]
^C
[r...@chmmr dbscripts]# /etc/init.d/postgresql start
^C

I have no better luck using pg_ctl directly versus using the postgresql
control script.

Again I hope this helps.  Thanks!

        - Karen

On Mon, 2009-10-26 at 17:07 +0800, Craig Ringer wrote:
> Karen Pease wrote:
> > kill -9 does kill postmaster (or at least seems to).  But I can't figure
> > out a way to get it restarted without a reboot -- I don't know what I'm
> > missing.  The Fedora postgres restart scripts don't do the trick, and I
> > couldn't get it to work with pg_ctl either.
> 
> It'd help to know where the postmaster was stuck, and if possible where
> the backend you were using is stuck.
> 
> A backtrace from gdb can be handy for this.
> 
> http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD
> 
> > kill -9 doesn't work on the locked up httpd processes.  So that has to
> > have the system restarted.
> 
> If `kill -9' isn't working they're probably in uninterruptable sleep in
> the kernel.
> 
> You can find out what they're sleeping in with `ps':
> 
>   ps ax -o pid,ppid,stat,wchan:50,cmd
> 
> (Filter for just the postmaster and postgres processes if you want)
> 
> > Both filesystems are EXT-4.
> 
> That's interesting given the issues you're having...
> 
> --
> Craig Ringer


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply via email to