On 08/13/2018 04:49 PM, Andres Freund wrote:
Hi,
On 2018-08-13 11:46:30 -0300, Alvaro Herrera wrote:
On 2018-Aug-11, Tomas Vondra wrote:
Hmmm, it's difficult to compare "bt full" output, but my backtraces look
somewhat different (and all the backtraces I'm seeing are 100% exactly
the same). Attached for comparison.
Hmm, looks similar enough to me -- at the bottom you have the executor
doing its thing, then an AcceptInvalidationMessages in the middle
section atop which sit a few more catalog accesses, and further up from
that you have another AcceptInvalidationMessages with more catalog
accesses. AFAICS that's pretty much the same thing Andres was
describing.
It's somewhat different because it doesn't seem to involve a reload of a
nailed table, which my traces did. I wasn't able to reproduce the crash
more than once, so I'm not at all sure how to properly verify the issue.
I'd appreciate if Thomas could try to do so again with the small patch I
provided.
I'll try in the evening. I've tried reproducing it on my laptop, but I
can't make that happen for some reason - I know I've seen some crashes
here, but all the reproducers were from the workstation I have at home.
I wonder if there's some subtle difference between the two boxes, making
it more likely on one of them ... the whole environment (distribution,
packages, compiler, ...) should be exactly the same, though. The only
thing I can think of is different CPU speed, possibly making some race
conditions more/less likely. No idea.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services