Re: [HACKERS] SR standby hangs

2011-04-26 Thread Andrew Dunstan
On 04/26/2011 05:31 PM, Tom Lane wrote: Andrew Dunstan writes: (gdb) p 'postmaster.c'::ProcGlobal->startupProcPid $1 = 0 (gdb) p 'postmaster.c'::ProcGlobal->startupProc $2 = (PGPROC *) 0x0 Oh ... you need this patch: http://git.postgresql.org/gitweb?p=postgresql.git&a=commitdiff&h=9bb1ddec4

Re: [HACKERS] SR standby hangs

2011-04-26 Thread Tom Lane
Andrew Dunstan writes: > (gdb) p 'postmaster.c'::ProcGlobal->startupProcPid > $1 = 0 > (gdb) p 'postmaster.c'::ProcGlobal->startupProc > $2 = (PGPROC *) 0x0 Oh ... you need this patch: http://git.postgresql.org/gitweb?p=postgresql.git&a=commitdiff&h=9bb1ddec4cd998c5fbac278a54d8ad5a5011e4e1 Upda

Re: [HACKERS] SR standby hangs

2011-04-26 Thread Andrew Dunstan
On 04/26/2011 04:45 PM, Tom Lane wrote: I wrote: Well, that's pretty interesting: refcount is only 1, and the BM_PIN_COUNT_WAITER flag is not set. AFAICS this *must* mean that the buffer had been pinned and whoever had it (presumably bgwriter) did UnpinBuffer(). So it appears that the signal

Re: [HACKERS] SR standby hangs

2011-04-26 Thread Tom Lane
I wrote: > Well, that's pretty interesting: refcount is only 1, and the > BM_PIN_COUNT_WAITER flag is not set. AFAICS this *must* mean that the > buffer had been pinned and whoever had it (presumably bgwriter) did > UnpinBuffer(). So it appears that the signal just plain got lost :-(, > which sug

Re: [HACKERS] SR standby hangs

2011-04-26 Thread Andrew Dunstan
On 04/26/2011 04:28 PM, Tom Lane wrote: Andrew Dunstan writes: This has happened again. This time we have some debug info available, and can possible get more, if people tell me what will be helpful: (gdb) f 2 #2 0x005de735 in LockBufferForCleanup (buffer=310163) at bu

Re: [HACKERS] SR standby hangs

2011-04-26 Thread Tom Lane
Andrew Dunstan writes: > This has happened again. This time we have some debug info available, > and can possible get more, if people tell me what will be helpful: > (gdb) f 2 > #2 0x005de735 in LockBufferForCleanup (buffer=310163) at > bufmgr.c:2432 > 2432

Re: [HACKERS] SR standby hangs

2011-04-26 Thread Andrew Dunstan
On 02/22/2011 07:55 AM, Robert Haas wrote: On Sun, Feb 20, 2011 at 12:39 PM, Greg Stark wrote: On Fri, Feb 18, 2011 at 6:59 PM, Andrew Dunstan wrote: The server is running as a warm standby, and the client's application tries to connect to both the master and the slave, accepting whichever

Re: [HACKERS] SR standby hangs

2011-02-28 Thread Robert Haas
On Tue, Feb 22, 2011 at 11:34 AM, Tom Lane wrote: > Greg Stark writes: >> On Tue, Feb 22, 2011 at 12:55 PM, Robert Haas wrote: >>> A little OT, but ISTM that the buffer pin mechanism by its nature is >>> prone to lock upgrade hazards. > >> Except that pins don't block exclusive locks so there's

Re: [HACKERS] SR standby hangs

2011-02-22 Thread Tom Lane
Greg Stark writes: > On Tue, Feb 22, 2011 at 12:55 PM, Robert Haas wrote: >> A little OT, but ISTM that the buffer pin mechanism by its nature is >> prone to lock upgrade hazards. > Except that pins don't block exclusive locks so there's no deadlock risk. > The oddity here is on Vacuums super-e

Re: [HACKERS] SR standby hangs

2011-02-22 Thread Greg Stark
On Tue, Feb 22, 2011 at 12:55 PM, Robert Haas wrote: > A little OT, but ISTM that the buffer pin mechanism by its nature is > prone to lock upgrade hazards.  A cleanup lock is essentially an > access exclusive lock on the buffer, while a buffer pin is an access > share lock.  In the middle, we hav

Re: [HACKERS] SR standby hangs

2011-02-22 Thread Robert Haas
On Sun, Feb 20, 2011 at 12:39 PM, Greg Stark wrote: > On Fri, Feb 18, 2011 at 6:59 PM, Andrew Dunstan wrote: >> The server is running as a warm standby, and the client's application tries >> to connect to both the master and the slave, accepting whichever lets it >> connect (hence hot standby is

Re: [HACKERS] SR standby hangs

2011-02-20 Thread Greg Stark
On Fri, Feb 18, 2011 at 6:59 PM, Andrew Dunstan wrote: > The server is running as a warm standby, and the client's application tries > to connect to both the master and the slave, accepting whichever lets it > connect (hence hot standby is not turned on). >... >   #2  0x005de645 in LockBuf

Re: [HACKERS] SR standby hangs

2011-02-18 Thread Andrew Dunstan
On 02/18/2011 03:42 PM, Robert Haas wrote: On Fri, Feb 18, 2011 at 2:50 PM, Tom Lane wrote: Robert Haas writes: On Fri, Feb 18, 2011 at 2:35 PM, Andrew Dunstan wrote: It's not running HS, so there's no query to wait on. That seems to imply that recovery has leaked a buffer pin. No, beca

Re: [HACKERS] SR standby hangs

2011-02-18 Thread Robert Haas
On Fri, Feb 18, 2011 at 2:50 PM, Tom Lane wrote: > Robert Haas writes: >> On Fri, Feb 18, 2011 at 2:35 PM, Andrew Dunstan wrote: >>> It's not running HS, so there's no query to wait on. > >> That seems to imply that recovery has leaked a buffer pin. > > No, because then the sanity check in LockB

[HACKERS] SR standby hangs

2011-02-18 Thread Andrew Dunstan
PostgreSQL Experts Inc has a client with a 9.0.2 streaming replication server that somehow becomes wedged after running for some time. The server is running as a warm standby, and the client's application tries to connect to both the master and the slave, accepting whichever lets it connect

Re: [HACKERS] SR standby hangs

2011-02-18 Thread Tom Lane
Robert Haas writes: > On Fri, Feb 18, 2011 at 2:35 PM, Andrew Dunstan wrote: >> It's not running HS, so there's no query to wait on. > That seems to imply that recovery has leaked a buffer pin. No, because then the sanity check in LockBufferForCleanup would have fired: /* There should

Re: [HACKERS] SR standby hangs

2011-02-18 Thread Robert Haas
On Fri, Feb 18, 2011 at 2:35 PM, Andrew Dunstan wrote: > It's not running HS, so there's no query to wait on. That seems to imply that recovery has leaked a buffer pin. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing

Re: [HACKERS] SR standby hangs

2011-02-18 Thread Andrew Dunstan
On 02/18/2011 02:23 PM, Tom Lane wrote: Andrew Dunstan writes: The symptom is that the recovery process blocks forever on a semaphore. We've crashed it and got the following backtrace: #0 0x003493ed5337 in semop () from /lib64/libc.so.6 #1 0x005bd103 in PGSemaphoreLock

Re: [HACKERS] SR standby hangs

2011-02-18 Thread Robert Haas
On Fri, Feb 18, 2011 at 2:16 PM, Andrew Dunstan wrote: > I'm not quite sure where to start digging. Has anyone else seen > something similar? Our consultant reports having seen a similar problem > elsewhere, at a client who was running hot standby on 9.0.1, but the > problem did not recur, as it d

Re: [HACKERS] SR standby hangs

2011-02-18 Thread Tom Lane
Andrew Dunstan writes: > The symptom is that the recovery process blocks forever on a semaphore. > We've crashed it and got the following backtrace: > #0 0x003493ed5337 in semop () from /lib64/libc.so.6 > #1 0x005bd103 in PGSemaphoreLock (sema=0x2b14986aec38, > interruptOK=

[HACKERS] SR standby hangs

2011-02-18 Thread Andrew Dunstan
[this time from the right address - apologies if we get a duplicate] PostgreSQL Experts Inc has a client with a 9.0.2 streaming replication server that somehow becomes wedged after running for some time. The server is running as a warm standby, and the client's application tries to connect to