Re: dsa_allocate() faliure

Justin Pryzby Sun, 10 Feb 2019 16:03:04 -0800

On Mon, Feb 11, 2019 at 09:45:07AM +1100, Thomas Munro wrote:
> Ouch.  Yeah, that'd do it and matches the evidence.  With this change,
> I couldn't reproduce the problem after 90 minutes with a test case
> that otherwise hits it within a couple of minutes.
...
> Note that this patch addresses the error "dsa_allocate could not find
> %zu free pages".  (The error "dsa_area could not attach to segment" is
> something else and apparently rarer.)


"could not attach" is the error reported early this morning while
stress-testing this patch with queued_alters queries in loops, so that's
consistent with your understanding.  And I guess it preceded getting stuck on
lock; although I don't how long between the first happened and the second, I'm
guess not long and perhaps immedidately; since the rest of the processes were
all stuck as in bug#15585 rather than ERRORing once every few minutes.

I mentioned that "could not attach to segment" occurs in leader either/or
parallel worker.  And most of the time causes an ERROR only, and doesn't wedge
all future parallel workers.  Maybe bug#15585 "wedged" state maybe only occurs
after some pattern of leader+worker failures (?)  I've just triggered bug#15585
again, but if there's a pattern, I don't see it.

Please let me know whether you're able to reproduce the "not attach" bug using
simultaneous loops around the queued_alters query; it's easy here.

Justin

Re: dsa_allocate() faliure

Reply via email to