On Sat, Feb 16, 2019 at 3:38 PM Justin Pryzby <pry...@telsasoft.com> wrote: > I saw this error once last week while stress testing to reproduce earlier > bugs, > but tentatively thought it was a downstream symptom of those bugs (since > fixed), and now wanted to check that #15585 and others were no longer > reproducible. Unfortunately I got this error while running same test case [2] > as for previous bug ('could not attach'). > > 2019-02-14 23:40:41.611 MST [32287] ERROR: cannot unpin a segment that is > not pinned > > On commit faf132449c0cafd31fe9f14bbf29ca0318a89058 (REL_11_STABLE including > both of last week's post-11.2 DSA patches), I reproduced twice, once within > ~2.5 hours, once within 30min. > > I'm not able to reproduce on master running overnight and now 16+hours.
Oh, I think I know why: dsm_unpin_segment() containt another variant of the race fixed by 6c0fb941 (that was for dsm_attach() being confused by segments with the same handle that are concurrently going away, but dsm_unpin_segment() does a handle lookup too, so it can be confused by the same phenomenon). Untested, but the fix is probably: diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c index cfbebeb31d..23ccc59f13 100644 --- a/src/backend/storage/ipc/dsm.c +++ b/src/backend/storage/ipc/dsm.c @@ -844,8 +844,8 @@ dsm_unpin_segment(dsm_handle handle) LWLockAcquire(DynamicSharedMemoryControlLock, LW_EXCLUSIVE); for (i = 0; i < dsm_control->nitems; ++i) { - /* Skip unused slots. */ - if (dsm_control->item[i].refcnt == 0) + /* Skip unused slots and segments that are concurrently going away. */ + if (dsm_control->item[i].refcnt <= 1) continue; /* If we've found our handle, we can stop searching. */ -- Thomas Munro http://www.enterprisedb.com