On Monday 01 March 2010 @ 16:03, Ed L. wrote:
> On Monday 01 March 2010 @ 15:59, Ed L. wrote:
> > > This just happened again ~24 hours after full reload from
> > >  backup. Arrrgh.
> > >
> > > Backtrace looks the same again, same file, same
> > > __read_nocancel().  $PGDATA/global/pg_auth looks fine to
> > > me, permissions are 600, entries are 3 or more
> > > double-quoted items per line each separated by a space,
> > > items 3 and beyond being groups.
> > >
> > > Any clues?
> 
> Also seeing lots of postmaster zombies (190 and growing)...

While new connections are hanging, top shows postmaster using 
100% of cpu.  SIGTERM/SIGQUIT do nothing.  Here's a backtrace 
of this busy postmaster:

(gdb) bt
#0  0x000000346f8c43a0 in __read_nocancel () from /lib64/libc.so.6
#1  0x000000346f86c747 in _IO_new_file_underflow () from /lib64/libc.so.6
#2  0x000000346f86d10e in _IO_default_uflow_internal () from /lib64/libc.so.6
#3  0x000000346f8689cb in getc () from /lib64/libc.so.6
#4  0x0000000000531ee8 in next_token (fp=0x10377ae0, buf=0x7fff32230e60 "", 
bufsz=4096) at hba.c:128
#5  0x0000000000532233 in tokenize_file (filename=0x10359b70 "global", 
file=0x10377ae0, lines=0x7fff322310f8, line_nums=0x7fff322310f0) at hba.c:232
#6  0x00000000005322e9 in tokenize_file (filename=0x2b1c8cbf5800 
"global/pg_auth", file=0x103767a0, lines=0x98b168, line_nums=0x98b170) at 
hba.c:358
#7  0x00000000005327ff in load_role () at hba.c:959
#8  0x000000000057f878 in sigusr1_handler (postgres_signal_arg=<value optimized 
out>) at postmaster.c:3830
#9  <signal handler called>
#10 0x000000346f8cb323 in __select_nocancel () from /lib64/libc.so.6
#11 0x000000000057cc33 in ServerLoop () at postmaster.c:1236
#12 0x000000000057dfdf in PostmasterMain (argc=6, argv=0x1033f000) at 
postmaster.c:1031
#13 0x00000000005373de in main (argc=6, argv=<value optimized out>) at 
main.c:188

...and more from the server logs, fwiw:

2010-03-01 17:30:24.213 CST [32238]    WARNING:  worker took too long to start; 
cancelled
2010-03-01 17:30:31.250 CST [32236]    DEBUG:  transaction log switch forced 
(archive_timeout=300)
2010-03-01 17:31:24.216 CST [32238]    WARNING:  worker took too long to start; 
cancelled
2010-03-01 17:32:24.219 CST [32238]    WARNING:  worker took too long to start; 
cancelled
2010-03-01 17:33:24.222 CST [32238]    WARNING:  worker took too long to start; 
cancelled
2010-03-01 17:34:24.225 CST [32238]    WARNING:  worker took too long to start; 
cancelled
2010-03-01 17:35:19.061 CST [32236]    LOG:  checkpoint starting: time
2010-03-01 17:35:19.185 CST [32236]    DEBUG:  recycled transaction log file 
"000000010000001C00000071"
2010-03-01 17:35:19.185 CST [32236]    LOG:  checkpoint complete: wrote 0 
buffers (0.0%); 0 transaction log file(s) added, 0 removed, 1 recycled; 
write=0.028 s, sync=0.000 s, total=0.124 s
2010-03-01 17:35:24.328 CST [32238]    WARNING:  worker took too long to start; 
cancelled
2010-03-01 17:35:31.224 CST [32236]    DEBUG:  transaction log switch forced 
(archive_timeout=300)
2010-03-01 17:36:44.332 CST [32238]    WARNING:  worker took too long to start; 
cancelled
2010-03-01 17:37:44.434 CST [32238]    WARNING:  worker took too long to start; 
cancelled
2010-03-01 17:37:47.378 CST [3692] dba 10....(42816) dba LOG:  could not 
receive data from client: Connection timed out
2010-03-01 17:37:47.378 CST [3692] dba 10....(42816) dba LOG:  unexpected EOF 
on client connection
2010-03-01 17:37:47.380 CST [3692] dba 10....(42816) dba LOG:  disconnection: 
session time: 2:11:15.303 user=dba database=dba host=... port=428

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to