On Thu, Jul 16, 2015 at 12:03 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:
> On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas <hlinn...@iki.fi> > wrote: > >> >> Both. Here's the patch. >> >> Previously, LWLockAcquireWithVar set the variable associated with the >> lock atomically with acquiring it. Before the lwlock-scalability changes, >> that was straightforward because you held the spinlock anyway, but it's a >> lot harder/expensive now. So I changed the way acquiring a lock with a >> variable works. There is now a separate flag, LW_FLAG_VAR_SET, which >> indicates that the current lock holder has updated the variable. The >> LWLockAcquireWithVar function is gone - you now just use LWLockAcquire(), >> which always clears the LW_FLAG_VAR_SET flag, and you can call >> LWLockUpdateVar() after that if you want to set the variable immediately. >> LWLockWaitForVar() always waits if the flag is not set, i.e. it will not >> return regardless of the variable's value, if the current lock-holder has >> not updated it yet. >> >> > I ran this for a while without casserts and it seems to work. But with > casserts, I get failures in the autovac process on the GIN index. > > I don't see how this is related to the LWLock issue, but I didn't see it > without your patch. Perhaps the system just didn't survive long enough to > uncover it without the patch (although it shows up pretty quickly). It > could just be an overzealous Assert, since the casserts off didn't show > problems. > > bt and bt full are shown below. > > Cheers, > > Jeff > > #0 0x0000003dcb632625 in raise () from /lib64/libc.so.6 > #1 0x0000003dcb633e05 in abort () from /lib64/libc.so.6 > #2 0x0000000000930b7a in ExceptionalCondition ( > conditionName=0x9a1440 "!(((PageHeader) (page))->pd_special >= > (__builtin_offsetof (PageHeaderData, pd_linp)))", errorType=0x9a12bc > "FailedAssertion", > fileName=0x9a12b0 "ginvacuum.c", lineNumber=713) at assert.c:54 > #3 0x00000000004947cf in ginvacuumcleanup (fcinfo=0x7fffee073a90) at > ginvacuum.c:713 > It now looks like this *is* unrelated to the LWLock issue. The assert that it is tripping over was added just recently (302ac7f27197855afa8c) and so I had not been testing under its presence until now. It looks like it is finding all-zero pages (index extended but then a crash before initializing the page?) and it doesn't like them. (gdb) f 3 (gdb) p *(char[8192]*)(page) $11 = '\000' <repeats 8191 times> Presumably before this assert, such pages would just be permanently orphaned. Cheers, Jeff