On 8/12/16 8:21 AM, Warner Losh wrote: > On Fri, Aug 12, 2016 at 9:17 AM, Kenneth D. Merry <k...@freebsd.org> wrote: >> On Fri, Aug 12, 2016 at 09:13:58 -0600, Warner Losh wrote: >>> On Fri, Aug 12, 2016 at 9:11 AM, Kenneth D. Merry <k...@freebsd.org> wrote: >>>> On Fri, Aug 12, 2016 at 13:38:21 +0300, Andrey V. Elsukov wrote: >>>>> On 12.08.16 03:26, Bryan Drewery wrote: >>>>>> On r303467 I ran into this: >>>>>> >>>>>> panic @ time 1470916206.652, thread 0xfffff8000412f000: >>>>>> g_resize_provider_event but withered >>>>>> cpuid = 0 >>>>>> Panic occurred in module kernel loaded at 0xffffffff80200000: >>>>>> >>>>>> Stack: -------------------------------------------------- >>>>>> kernel:kassert_panic+0x166 >>>>>> kernel:g_resize_provider_event+0x181 >>>>>> kernel:g_run_events+0x186^M^M >>>>>> kernel:fork_exit+0x83^M^M >>>>>> -------------------------------------------------- >>>>>> >>>>>> No further information available unfortunately. >>>>> >>>>> This one is related to r302087 :) >>>> >>>> It looks like there is a race. I think we need to replace the KASSERT >>>> in g_resize_provider_event() with a return in case the provider is >>>> withered. >>>> >>>> I won't be able to work on or test this until sometime next week. So if >>>> you guys want to go ahead and make the change, please do. >>> >>> But why are we calling g_resize_provider on a withered object? That's >>> the part I don't understand in this thread. >> >> It isn't withered when the event is queued, but it is withered by the time >> the event is executed. >> >> There is a check in g_resize_provider() to make sure it isn't withered. If >> not, the event is queued. But once g_resize_provider_event() runs, it is >> withered and we run into the KASSERT. >> >> There isn't adequate locking and ordering in there to prevent the race >> from happening, so the assert should be replaced with an "if (withered) >> return" statement. > > I'll grant that we may wither with outstanding events, but why is it > withering? That seems odd. Either we're bogusly posting this event > just before it will wither, or something else is bogusly withering it. > Just removing the assert isn't going to fix the underlying issue. > > Back to Bryan: just to be clear, this is with the latest version of > the code, and not the intermediate version that was fixed after > numerous problems surfaced, right? >
No, I was missing r303637. Hard to say if it is related... Andrey says it's not. I haven't dived into it yet and it's so far only happened once (out of a few tests). We do have various customizations but I'm inclined to think it's the stock code having problems. -- Regards, Bryan Drewery
signature.asc
Description: OpenPGP digital signature