Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-28 Thread Merlin Moncure
On Wed, Jan 28, 2015 at 12:47 PM, Tom Lane wrote: > Merlin Moncure writes: >> ...hm, I spoke to soon. So I deleted everything, and booted up a new >> instance 9.4 vanilla with asserts on and took no other action. >> Applying the script with no data activity fails an assertion every >> single tim

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-28 Thread Tom Lane
Merlin Moncure writes: > ...hm, I spoke to soon. So I deleted everything, and booted up a new > instance 9.4 vanilla with asserts on and took no other action. > Applying the script with no data activity fails an assertion every > single time: > TRAP: FailedAssertion("!(flags & 0x0010)", File: "d

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-28 Thread Merlin Moncure
On Wed, Jan 28, 2015 at 8:05 AM, Merlin Moncure wrote: > On Thu, Jan 22, 2015 at 3:50 PM, Merlin Moncure wrote: >> I still haven't categorically ruled out pl/sh yet; that's something to >> keep in mind. > > Well, after bisection proved not to be fruitful, I replaced the pl/sh > calls with dummy c

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-28 Thread Merlin Moncure
On Thu, Jan 22, 2015 at 3:50 PM, Merlin Moncure wrote: > I still haven't categorically ruled out pl/sh yet; that's something to > keep in mind. Well, after bisection proved not to be fruitful, I replaced the pl/sh calls with dummy calls that approximated the same behavior and the problem went awa

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-24 Thread Martijn van Oosterhout
On Thu, Jan 22, 2015 at 03:50:03PM -0600, Merlin Moncure wrote: > Quick update: not done yet, but I'm making consistent progress, with > several false starts. (for example, I had a .conf problem with the > new dynamic shared memory setting and git merrily bisected down to the > introduction of th

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-23 Thread Peter Geoghegan
On Thu, Jan 22, 2015 at 1:50 PM, Merlin Moncure wrote: > Quick update: not done yet, but I'm making consistent progress, with > several false starts. (for example, I had a .conf problem with the > new dynamic shared memory setting and git merrily bisected down to the > introduction of the featur

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-23 Thread Jeff Janes
On Thu, Jan 22, 2015 at 1:50 PM, Merlin Moncure wrote: > > So far, the 'nasty' damage seems to generally if not always follow a > checksum failure and the checksum failures are always numerically > adjacent. For example: > > [cds2 12707 2015-01-22 12:51:11.032 CST 2754]WARNING: page > verificat

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-22 Thread Merlin Moncure
On Fri, Jan 16, 2015 at 5:20 PM, Peter Geoghegan wrote: > On Fri, Jan 16, 2015 at 10:33 AM, Merlin Moncure wrote: >> ISTM the next step is to bisect the problem down over the weekend in >> order to to narrow the search. If that doesn't turn up anything >> productive I'll look into taking other s

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Peter Geoghegan
On Fri, Jan 16, 2015 at 6:21 AM, Heikki Linnakangas wrote: > It looks very much like that a page has for some reason been moved to a > different block number. And that's exactly what Peter found out in his > investigation too; an index page was mysteriously copied to a different > block with ident

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Peter Geoghegan
On Fri, Jan 16, 2015 at 10:33 AM, Merlin Moncure wrote: > ISTM the next step is to bisect the problem down over the weekend in > order to to narrow the search. If that doesn't turn up anything > productive I'll look into taking other steps. That might be the quickest way to do it, provided you c

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Merlin Moncure
On Fri, Jan 16, 2015 at 8:22 AM, Andres Freund wrote: > Hi, > > On 2015-01-16 08:05:07 -0600, Merlin Moncure wrote: >> On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan wrote: >> > On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure wrote: >> >> Running this test on another set of hardware to verify

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Merlin Moncure
On Fri, Jan 16, 2015 at 8:22 AM, Andres Freund wrote: > Is there any chance you can package this somehow so that others can run > it locally? It looks hard to find the actual bug here without adding > instrumentation to to postgres. That's possible but involves a lot of complexity in the setup be

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Andres Freund
Hi, On 2015-01-16 08:05:07 -0600, Merlin Moncure wrote: > On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan wrote: > > On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure wrote: > >> Running this test on another set of hardware to verify -- if this > >> turns out to be a false alarm which it may very

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Heikki Linnakangas
On 01/16/2015 04:05 PM, Merlin Moncure wrote: On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan wrote: On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure wrote: Running this test on another set of hardware to verify -- if this turns out to be a false alarm which it may very well be, I can only of

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Merlin Moncure
On Fri, Jan 16, 2015 at 8:05 AM, Merlin Moncure wrote: > On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan wrote: >> On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure wrote: >>> Running this test on another set of hardware to verify -- if this >>> turns out to be a false alarm which it may very wel

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Merlin Moncure
On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan wrote: > On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure wrote: >> Running this test on another set of hardware to verify -- if this >> turns out to be a false alarm which it may very well be, I can only >> offer my apologies! I've never had a new

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Peter Geoghegan
On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure wrote: > Running this test on another set of hardware to verify -- if this > turns out to be a false alarm which it may very well be, I can only > offer my apologies! I've never had a new drive fail like that, in > that manner. I'll burn the other

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Merlin Moncure
On Thu, Jan 15, 2015 at 4:03 PM, Merlin Moncure wrote: > On Thu, Jan 15, 2015 at 1:32 PM, Merlin Moncure wrote: >> Since it's possible the database is a loss, do you see any value in >> bootstrappinng it again with checksums turned on? One point of note >> is that this is a brand spanking new SS

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Merlin Moncure
On Thu, Jan 15, 2015 at 1:32 PM, Merlin Moncure wrote: > Since it's possible the database is a loss, do you see any value in > bootstrappinng it again with checksums turned on? One point of note > is that this is a brand spanking new SSD, maybe we nee to rule out > hardware based corruption? hm!

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Merlin Moncure
On Thu, Jan 15, 2015 at 1:15 PM, Andres Freund wrote: > Hi, > >> The plot thickens! I looped the test, still stock 9.4 as of this time >> and went to lunch. When I came back, the database was in recovery >> mode. Here is the rough sequence of events. >> > > Whoa. That looks scary. Did you see (s

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Andres Freund
On 2015-01-15 20:15:42 +0100, Andres Freund wrote: > > WARNING: did not find subXID 14955 in MyProc > > CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 > > during exception cleanup > > WARNING: you don't own a lock of type RowExclusiveLock > > CONTEXT: PL/pgSQL function cdsrecon

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Andres Freund
Hi, > The plot thickens! I looped the test, still stock 9.4 as of this time > and went to lunch. When I came back, the database was in recovery > mode. Here is the rough sequence of events. > Whoa. That looks scary. Did you see (some of) those errors before? Most of them should have been emitte

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Merlin Moncure
On Thu, Jan 15, 2015 at 8:02 AM, Merlin Moncure wrote: > On Thu, Jan 15, 2015 at 6:04 AM, Heikki Linnakangas > wrote: >> On 01/15/2015 03:23 AM, Peter Geoghegan wrote: >>> >>> So now the question is: how did that inconsistency arise? It didn't >>> necessarily arise at the time of the (presumed) s

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Peter Geoghegan
On Thu, Jan 15, 2015 at 6:02 AM, Merlin Moncure wrote: > Question: Coming in this morning I did an immediate restart and logged > into the database and queried pg_class via index. Everything was > fine, and the leftright verify returns nothing. How did it repair > itself without a reindex? May

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Merlin Moncure
On Thu, Jan 15, 2015 at 6:04 AM, Heikki Linnakangas wrote: > On 01/15/2015 03:23 AM, Peter Geoghegan wrote: >> >> So now the question is: how did that inconsistency arise? It didn't >> necessarily arise at the time of the (presumed) split of block 2 to >> create 9. It could be that the opaque area

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Heikki Linnakangas
On 01/15/2015 03:23 AM, Peter Geoghegan wrote: So now the question is: how did that inconsistency arise? It didn't necessarily arise at the time of the (presumed) split of block 2 to create 9. It could be that the opaque area was changed by something else, some time later. I'll investigate more.

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Peter Geoghegan
On Wed, Jan 14, 2015 at 8:50 PM, Peter Geoghegan wrote: > I am mistaken on one detail here - blocks 2 and 9 are actually fully > identical. I still have no idea why, though. So, I've looked at it in more detail and it appears that the page of block 2 split at some point, thereby creating a new pa

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan
On Wed, Jan 14, 2015 at 5:23 PM, Peter Geoghegan wrote: > My immediate observation here is that blocks 2 and 9 have identical > metadata (from their page opaque area), but partially non-matching > data items (however, the number of items on each block is consistent > and correct according to that

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan
On Wed, Jan 14, 2015 at 5:23 PM, Peter Geoghegan wrote: > My immediate observation here is that blocks 2 and 9 have identical > metadata (from their page opaque area), but partially non-matching > data items (however, the number of items on each block is consistent > and correct according to that

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan
On Wed, Jan 14, 2015 at 4:53 PM, Merlin Moncure wrote: > yeah. via: > cds2=# \copy (select s as page, (bt_page_items('pg_class_oid_index', > s)).* from generate_series(1,12) s) to '/tmp/page_items.csv' csv > header; My immediate observation here is that blocks 2 and 9 have identical metadata (fr

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 6:50 PM, Peter Geoghegan wrote: > This is great, but it's not exactly clear which bt_page_items() page > is which - some are skipped, but I can't be sure which. Would you mind > rewriting that query to indicate which block is under consideration by > bt_page_items()? yeah.

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan
This is great, but it's not exactly clear which bt_page_items() page is which - some are skipped, but I can't be sure which. Would you mind rewriting that query to indicate which block is under consideration by bt_page_items()? Thanks -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 6:26 PM, Merlin Moncure wrote: > On Wed, Jan 14, 2015 at 5:39 PM, Peter Geoghegan wrote: >> On Wed, Jan 14, 2015 at 3:38 PM, Merlin Moncure wrote: >>> (gdb) print BufferGetBlockNumber(buf) >>> $15 = 9 >>> >>> ..and it stays 9, continuing several times having set breakpoi

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan
On Wed, Jan 14, 2015 at 4:26 PM, Merlin Moncure wrote: > The index is the oid index on pg_class. Some more info: > > *) temp table churn is fairly high. Several dozen get spawned and > destroted at the start of a replication run, all at once, due to some > dodgy coding via dblink. During the re

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 5:39 PM, Peter Geoghegan wrote: > On Wed, Jan 14, 2015 at 3:38 PM, Merlin Moncure wrote: >> (gdb) print BufferGetBlockNumber(buf) >> $15 = 9 >> >> ..and it stays 9, continuing several times having set breakpoint. > > > And the index involved? I'm pretty sure that this in

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan
On Wed, Jan 14, 2015 at 3:38 PM, Merlin Moncure wrote: > (gdb) print BufferGetBlockNumber(buf) > $15 = 9 > > ..and it stays 9, continuing several times having set breakpoint. And the index involved? I'm pretty sure that this in an internal page, no? -- Peter Geoghegan -- Sent via pgsql-hac

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 2:32 PM, Peter Geoghegan wrote: > On Wed, Jan 14, 2015 at 12:24 PM, Peter Geoghegan wrote: >> Could you write some code to print out the block number (i.e. >> "BlockNumber blkno") if there are more than, say, 5 retries within >> _bt_moveright()? > > Obviously I mean that t

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan
On Wed, Jan 14, 2015 at 12:24 PM, Peter Geoghegan wrote: > Could you write some code to print out the block number (i.e. > "BlockNumber blkno") if there are more than, say, 5 retries within > _bt_moveright()? Obviously I mean that the block number should be printed, no matter whether or not the P

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan
On Wed, Jan 14, 2015 at 11:49 AM, Merlin Moncure wrote: > so it looks like nobody ever exits from _bt_moveright. any last > requests before I start bisecting down? Could you write some code to print out the block number (i.e. "BlockNumber blkno") if there are more than, say, 5 retries within _b

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 9:49 AM, Andres Freund wrote: > On 2015-01-14 09:47:19 -0600, Merlin Moncure wrote: >> On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund >> wrote: >> > If you gdb in, and type 'fin' a couple times, to wait till the function >> > finishes, is there actually any progress? I'm

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan
On Wed, Jan 14, 2015 at 7:22 AM, Merlin Moncure wrote: > I'll try to pull commits that Peter suggested and see if that helps > (I'm getting ready to bring the database down). I can send the code > off-list if you guys think it'd help. Thanks for the code! I think it would be interesting to see

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Andres Freund
On 2015-01-14 09:47:19 -0600, Merlin Moncure wrote: > On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund wrote: > > If you gdb in, and type 'fin' a couple times, to wait till the function > > finishes, is there actually any progress? I'm wondering whether it's > > just many catalog accesses + contenti

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund wrote: > If you gdb in, and type 'fin' a couple times, to wait till the function > finishes, is there actually any progress? I'm wondering whether it's > just many catalog accesses + contention, or some other > problem. Alternatively set a breakpoint

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Andres Freund
On 2015-01-14 09:22:45 -0600, Merlin Moncure wrote: > On Wed, Jan 14, 2015 at 9:11 AM, Andres Freund wrote: > > On 2015-01-14 10:05:01 -0500, Tom Lane wrote: > >> Merlin Moncure writes: > >> > On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane wrote: > >> >> What are the autovac processes doing (accordin

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 9:11 AM, Andres Freund wrote: > On 2015-01-14 10:05:01 -0500, Tom Lane wrote: >> Merlin Moncure writes: >> > On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane wrote: >> >> What are the autovac processes doing (according to pg_stat_activity)? >> >> > pid,running,waiting,query >> >

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Andres Freund
On 2015-01-14 10:13:32 -0500, Tom Lane wrote: > Merlin Moncure writes: > > Yes, it is pg_class is coming from LockBufferForCleanup (). As you > > can see above, it has a shorter runtime. So it was killed off once > > about a half hour ago which did not free up the logjam. However, AV > > spaw

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Tom Lane
Andres Freund writes: > On 2015-01-14 10:05:01 -0500, Tom Lane wrote: >> Hah, I suspected as much. Is that the one that's stuck in >> LockBufferForCleanup, or the other one that's got a similar backtrace >> to all the user processes? > Do you have a theory? Right now it primarily looks like cont

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Tom Lane
Merlin Moncure writes: > Yes, it is pg_class is coming from LockBufferForCleanup (). As you > can see above, it has a shorter runtime. So it was killed off once > about a half hour ago which did not free up the logjam. However, AV > spawned it again and now it does not respond to cancel. Int

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Andres Freund
On 2015-01-14 10:05:01 -0500, Tom Lane wrote: > Merlin Moncure writes: > > On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane wrote: > >> What are the autovac processes doing (according to pg_stat_activity)? > > > pid,running,waiting,query > > 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 9:05 AM, Tom Lane wrote: > Merlin Moncure writes: >> On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane wrote: >>> What are the autovac processes doing (according to pg_stat_activity)? > >> pid,running,waiting,query >> 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.p

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Tom Lane
Merlin Moncure writes: > On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane wrote: >> What are the autovac processes doing (according to pg_stat_activity)? > pid,running,waiting,query > 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class Hah, I suspected as much. Is that the one that'

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane wrote: > Merlin Moncure writes: >> There were seven process with that backtrace exact backtrace (except >> that randomly they are sleeping in the spinloop). Something else >> interesting: autovacuum has been running all night as well. Unlike >> the oth

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Tom Lane
Merlin Moncure writes: > There were seven process with that backtrace exact backtrace (except > that randomly they are sleeping in the spinloop). Something else > interesting: autovacuum has been running all night as well. Unlike > the other process however, cpu utilization does not register on

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 8:03 AM, Merlin Moncure wrote: > Here's a backtrace: > > #0 0x00750a97 in spin_delay () > #1 0x00750b19 in s_lock () > #2 0x00750844 in LWLockRelease () > #3 0x0073 in LockBuffer () > #4 0x004b2db4 in _bt_relandgetbuf () > #5

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Tue, Jan 13, 2015 at 7:24 PM, Peter Geoghegan wrote: > On Tue, Jan 13, 2015 at 3:54 PM, Merlin Moncure wrote: >> Some more information what's happening: >> This is a ghetto logical replication engine that migrates data from >> sql sever to postgres, consolidating a sharded database into a sing

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Peter Geoghegan
On Tue, Jan 13, 2015 at 3:54 PM, Merlin Moncure wrote: > Some more information what's happening: > This is a ghetto logical replication engine that migrates data from > sql sever to postgres, consolidating a sharded database into a single > set of tables (of which there are only two). There is onl

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Peter Geoghegan
On Tue, Jan 13, 2015 at 3:54 PM, Andres Freund wrote: >> I don't remember seeing _bt_moveright() or _bt_compare() figuring so >> prominently, where _bt_binsrch() is nowhere to be seen. I can't see a >> reference to _bt_binsrch() in either profile. > > Well, we do a _bt_moveright pretty early on,

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Peter Geoghegan
On Tue, Jan 13, 2015 at 4:05 PM, Tom Lane wrote: > I'm not convinced that Peter is barking up the right tree. I'm noticing > that the profiles seem rather skewed towards parser/planner work; so I > suspect the contention is probably on access to system catalogs. No > idea exactly why though. I

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Andres Freund
On 2015-01-13 19:05:10 -0500, Tom Lane wrote: > Merlin Moncure writes: > > On Tue, Jan 13, 2015 at 5:54 PM, Peter Geoghegan wrote: > >> In case it isn't clear, I think that the proximate cause here may well > >> be either one (or both) of commits > >> efada2b8e920adfdf7418862e939925d2acd1b89 and/

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Tom Lane
Merlin Moncure writes: > On Tue, Jan 13, 2015 at 5:54 PM, Peter Geoghegan wrote: >> In case it isn't clear, I think that the proximate cause here may well >> be either one (or both) of commits >> efada2b8e920adfdf7418862e939925d2acd1b89 and/or >> 40dae7ec537c5619fc93ad602c62f37be786d161. Probably

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Merlin Moncure
On Tue, Jan 13, 2015 at 5:54 PM, Peter Geoghegan wrote: > On Tue, Jan 13, 2015 at 3:50 PM, Merlin Moncure wrote: >>> I don't remember seeing _bt_moveright() or _bt_compare() figuring so >>> prominently, where _bt_binsrch() is nowhere to be seen. I can't see a >>> reference to _bt_binsrch() in ei

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Merlin Moncure
On Tue, Jan 13, 2015 at 5:42 PM, Andres Freund wrote: > On 2015-01-13 17:39:09 -0600, Merlin Moncure wrote: >> On Tue, Jan 13, 2015 at 5:21 PM, Andres Freund >> wrote: >> > On 2015-01-13 15:17:15 -0800, Peter Geoghegan wrote: >> >> I'm inclined to think that this is a livelock, and so the proble

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Peter Geoghegan
On Tue, Jan 13, 2015 at 3:50 PM, Merlin Moncure wrote: >> I don't remember seeing _bt_moveright() or _bt_compare() figuring so >> prominently, where _bt_binsrch() is nowhere to be seen. I can't see a >> reference to _bt_binsrch() in either profile. > > hm, this is hand compiled now, I bet the sym

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Andres Freund
On 2015-01-13 15:49:33 -0800, Peter Geoghegan wrote: > On Tue, Jan 13, 2015 at 3:21 PM, Andres Freund wrote: > > My guess is rather that it's contention on the freelist lock via > > StrategyGetBuffer's. I've seen profiles like this due to exactly that > > before - and it fits to parallel loading q

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Merlin Moncure
On Tue, Jan 13, 2015 at 5:49 PM, Peter Geoghegan wrote: > On Tue, Jan 13, 2015 at 3:21 PM, Andres Freund wrote: >> My guess is rather that it's contention on the freelist lock via >> StrategyGetBuffer's. I've seen profiles like this due to exactly that >> before - and it fits to parallel loading

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Peter Geoghegan
On Tue, Jan 13, 2015 at 3:21 PM, Andres Freund wrote: > My guess is rather that it's contention on the freelist lock via > StrategyGetBuffer's. I've seen profiles like this due to exactly that > before - and it fits to parallel loading quite well. I'm not saying you're wrong, but the breakdown of

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Andres Freund
On 2015-01-13 17:39:09 -0600, Merlin Moncure wrote: > On Tue, Jan 13, 2015 at 5:21 PM, Andres Freund wrote: > > On 2015-01-13 15:17:15 -0800, Peter Geoghegan wrote: > >> I'm inclined to think that this is a livelock, and so the problem > >> isn't evident from the structure of the B-Tree, but it ca

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Merlin Moncure
On Tue, Jan 13, 2015 at 5:21 PM, Andres Freund wrote: > On 2015-01-13 15:17:15 -0800, Peter Geoghegan wrote: >> I'm inclined to think that this is a livelock, and so the problem >> isn't evident from the structure of the B-Tree, but it can't hurt to >> check. > > My guess is rather that it's conte

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Andres Freund
On 2015-01-13 15:17:15 -0800, Peter Geoghegan wrote: > I'm inclined to think that this is a livelock, and so the problem > isn't evident from the structure of the B-Tree, but it can't hurt to > check. My guess is rather that it's contention on the freelist lock via StrategyGetBuffer's. I've seen p

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Peter Geoghegan
On Tue, Jan 13, 2015 at 2:29 PM, Merlin Moncure wrote: > On my workstation today (running vanilla 9.4.0) I was testing some new > code that does aggressive parallel loading to a couple of tables. Could you give more details, please? For example, I'd like to see representative data, or at least th

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Merlin Moncure
On Tue, Jan 13, 2015 at 4:33 PM, Andres Freund wrote: > Hi, > > On 2015-01-13 16:29:51 -0600, Merlin Moncure wrote: >> On my workstation today (running vanilla 9.4.0) I was testing some new >> code that does aggressive parallel loading to a couple of tables. It >> ran ok several dozen times and fr

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Andres Freund
Hi, On 2015-01-13 16:29:51 -0600, Merlin Moncure wrote: > On my workstation today (running vanilla 9.4.0) I was testing some new > code that does aggressive parallel loading to a couple of tables. It > ran ok several dozen times and froze up with no external trigger. > There were at most 8 active

[HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Merlin Moncure
On my workstation today (running vanilla 9.4.0) I was testing some new code that does aggressive parallel loading to a couple of tables. It ran ok several dozen times and froze up with no external trigger. There were at most 8 active backends that were stuck (the loader is threaded to a cap) -- eac