On 7/23/23 17:27, PHO wrote:
On 7/22/23 22:41, Taylor R Campbell wrote:
Date: Sat, 22 Jul 2023 21:52:40 +0900
From: PHO <p...@cielonegro.org>

Jul 17 00:52:34 netbsd-current /netbsd: [ 64017.6151161]
vmw_fence_wait() at netbsd:vmw_fence_wait+0xdc

Just to confirm, what does `info line *(vmw_fence_wait+0xdc)' say in
gdb?

And, if you can get to the frame in gdb, what does gdb say &cb.wq is
in the vmw_fence_wait frame, and what cv is in the cv_destroy frame?

Let's confirm it is the cv you think it is -- I suspect it might be a
different one.

I just encountered the crash and could obtain a crash dump. It is indeed the "DRM_DESTROY_WAITQUEUE(&cb.wq)" in vmw_fence_wait() but the contents of cb does not make sense to me:

...

CV_SLEEPQ(cv) is 0x01 (wtf) and CV_WMESG(cv) is not even a string?

I realized the cause of this:

static long vmw_fence_wait(struct dma_fence *f, bool intr, signed long timeout)
{
        ...
        if (likely(vmw_fence_obj_signaled(fence)))
                return timeout;
        ...
        spin_lock(f->lock);

        if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &f->flags))
                goto out; // <-- THIS ONE

        if (intr && signal_pending(current)) {
                ret = -ERESTARTSYS;
                goto out; // <-- OR THIS
        }

#ifdef __NetBSD__
        DRM_INIT_WAITQUEUE(&cb.wq, "vmwgfxwf");
#else
        cb.task = current;
#endif
        ...
out:
        spin_unlock(f->lock);
#ifdef __NetBSD__
        DRM_DESTROY_WAITQUEUE(&cb.wq);
#endif
        ...
}

There were cases where the function was destroying a condvar that it didn't initialize! Ugh, this is the very reason why I dislike C...

Reply via email to