On 2018-09-20 22:59:29 -0400, Tom Lane wrote: > Thomas Munro <thomas.mu...@enterprisedb.com> writes: > > Andres pinged me off-list to point out this failure after my commit > > fb389498be: > > > ! FATAL: semop(id=332464133) failed: Invalid argument > > I was just looking at that, and my guess is that it was caused by > something doing an ipcrm or equivalent, and is unrelated to your patch. > Especially since skink has succeeded with that patch in several other > branches.
I'm (hopefully) the only person with access to that machine, and I certainly didn't do so. Nor are there script I know of that'd do so. There's not been a lot of instability on skink, so it's certainly quite weird. I'm quite suspicious of the logic around: /* * If we received a query cancel or termination signal, we will have * EINTR set here. If the caller said that errors are OK here, check * for interrupts immediately. */ if (errno == EINTR && elevel >= ERROR) CHECK_FOR_INTERRUPTS(); because it seems far from guaranteed to do anything meaningful as I don't see a guarantee that interrupts are active at that point (e.g. it seems quite reasonable to hold an lwlock while resizing). Afaict that might cause problems at a later stage, because at that point we've not adjusted the actual mapping, but *have* ftruncate()ed it. If there's actual data in the mapping, that certainly could cause trouble. In fact, while this commit has expanded the size of the problem, I fail to see how the error handling for resizing is correct. It's fine to fail in the ftruncate() itself - at that point no changes have been made -, but I don't think it's currently ok for posix_fallocate() to ever error out. It's not clear to me how that'd be problematic in 9.5 of all releases however. > If it's repeatable, then it would be time to get excited. Yea, I guess we'll have to wait :/. Greetings, Andres Freund