> Date: Mon, 17 Jul 2023 12:57:42 +0900 > From: PHO <p...@cielonegro.org> > > While further testing my DRM/KMS vmwgfx driver patches by playing > games/minetest from pkgsrc, I experienced inexplicable kernel panics on > this code: > > https://github.com/depressed-pho/netbsd-src/blob/91daa67f17222da355d3fddd6fa849c786d9c545/sys/external/bsd/drm2/dist/drm/vmwgfx/vmwgfx_fence.c#L289
A note about one of the problems there: spin_unlock(f->lock); ret = dma_fence_add_callback(f, &cb.base, vmwgfx_wait_cb); spin_lock(f->lock); #if defined(__NetBSD__) /* This is probably an upstream bug: there is a time window between * the call of vmw_fence_obj_signaled() above, and this * dma_fence_add_callback(). If the fence gets signaled during it * dma_fence_add_callback() returns -ENOENT, which is really not an * error condition. By the way, why the heck does dma_fence work in * this way? If a callback is being added but it lost the race, why * not just call it immediately as if it were just signaled? */ Not an upstream bug -- I introduced this bug when I patched the code that reached into the guts of what should be an opaque data structure for direct modification, to use drm_fence_add_callback instead. Need to look at the diff from upstream, not just the #ifdefs. Usually I use #ifdef __NetBSD__ to mark NetBSDisms separately from Linuxisms, and just patch the code when the patched code can use a common API that isn't any one OSism. In this case I don't even remember why I left any #ifdefs, was probably just working fast to make progress on a large code base, might have left the #ifdefs in for visual reference while I was editing the code and forgot to remove them. Could also simplify some of the lock/unlock cycles by doing that. > cv_destroy(&cv); // <-- Panics! > > It seldom panics on KASSERT(!cv_has_waiters(cv)) in cv_destroy() but not > always. The panic seems to happen when cv_timedwait_sig() exits due to > the timeout expiring before it gets signaled. Confused by `seldom panics on ... but not always' -- was that supposed to be `often panics on ... but not always', or is there a more frequent panic than KASSERT(!cv_has_waiters(cv))? What exactly is the panic you see and the evidence when you see it? Stack trace, gdb print cb in crash dump?