On 7/23/23 17:27, PHO wrote:
On 7/22/23 22:41, Taylor R Campbell wrote:
Date: Sat, 22 Jul 2023 21:52:40 +0900
From: PHO <p...@cielonegro.org>
Jul 17 00:52:34 netbsd-current /netbsd: [ 64017.6151161]
vmw_fence_wait() at netbsd:vmw_fence_wait+0xdc
Just to confirm, what does `info line *(vmw_fence_wait+0xdc)' say in
gdb?
And, if you can get to the frame in gdb, what does gdb say &cb.wq is
in the vmw_fence_wait frame, and what cv is in the cv_destroy frame?
Let's confirm it is the cv you think it is -- I suspect it might be a
different one.
I just encountered the crash and could obtain a crash dump. It is indeed
the "DRM_DESTROY_WAITQUEUE(&cb.wq)" in vmw_fence_wait() but the contents
of cb does not make sense to me:
...
CV_SLEEPQ(cv) is 0x01 (wtf) and CV_WMESG(cv) is not even a string?
I realized the cause of this:
static long vmw_fence_wait(struct dma_fence *f, bool intr, signed long
timeout)
{
...
if (likely(vmw_fence_obj_signaled(fence)))
return timeout;
...
spin_lock(f->lock);
if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &f->flags))
goto out; // <-- THIS ONE
if (intr && signal_pending(current)) {
ret = -ERESTARTSYS;
goto out; // <-- OR THIS
}
#ifdef __NetBSD__
DRM_INIT_WAITQUEUE(&cb.wq, "vmwgfxwf");
#else
cb.task = current;
#endif
...
out:
spin_unlock(f->lock);
#ifdef __NetBSD__
DRM_DESTROY_WAITQUEUE(&cb.wq);
#endif
...
}
There were cases where the function was destroying a condvar that it
didn't initialize! Ugh, this is the very reason why I dislike C...