On Sat, Feb 16, 2019 at 3:38 PM Justin Pryzby <pry...@telsasoft.com> wrote:
> I saw this error once last week while stress testing to reproduce earlier 
> bugs,
> but tentatively thought it was a downstream symptom of those bugs (since
> fixed), and now wanted to check that #15585 and others were no longer
> reproducible.  Unfortunately I got this error while running same test case [2]
> as for previous bug ('could not attach').
>
> 2019-02-14 23:40:41.611 MST [32287] ERROR:  cannot unpin a segment that is 
> not pinned
>
> On commit faf132449c0cafd31fe9f14bbf29ca0318a89058 (REL_11_STABLE including
> both of last week's post-11.2 DSA patches), I reproduced twice, once within
> ~2.5 hours, once within 30min.
>
> I'm not able to reproduce on master running overnight and now 16+hours.

Oh, I think I know why: dsm_unpin_segment() containt another variant
of the race fixed by 6c0fb941 (that was for dsm_attach() being
confused by segments with the same handle that are concurrently going
away, but dsm_unpin_segment() does a handle lookup too, so it can be
confused by the same phenomenon).  Untested, but the fix is probably:

diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index cfbebeb31d..23ccc59f13 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -844,8 +844,8 @@ dsm_unpin_segment(dsm_handle handle)
        LWLockAcquire(DynamicSharedMemoryControlLock, LW_EXCLUSIVE);
        for (i = 0; i < dsm_control->nitems; ++i)
        {
-               /* Skip unused slots. */
-               if (dsm_control->item[i].refcnt == 0)
+               /* Skip unused slots and segments that are
concurrently going away. */
+               if (dsm_control->item[i].refcnt <= 1)
                        continue;

                /* If we've found our handle, we can stop searching. */

-- 
Thomas Munro
http://www.enterprisedb.com

Reply via email to