Introduction.

The affected system is stable/14, amd64.
The kernel is custom, it's configured with INVARIANTS.

The problem started to happen rather reliably after a recent upgrade of packages. I suspect that the trigger is in linux-nvidia-libs-570.124.04, but the bug is in FreeBSD Linux emulation.

The reason for my suspicion is that the crash happens when starting a graphical Linux application in a Linux jail. And the crash involves a graphics-related character device.

Just in case, the jail itself, including the application, hasn't been changed.
Also, I haven't touched the base system recently.

Details.

VNASSERT failed: old > 0 not true at sys/kern/vfs_subr.c:3361 (vrefact)
0xfffff802945df380: type VCHR state VSTATE_CONSTRUCTED op 0xffffffff8127b648
    usecount 1, writecount 0, refcount 39 seqc users 0 rdev 0xfffff8004565f400
    hold count flags ()
    flags ()
    lock type devfs: UNLOCKED
        dev drm/128
panic: vrefact: wrong use count 0
cpuid = 1
time = 1742796535
KDB: stack backtrace:
db_trace_self_wrapper() at 0xffffffff8061eadb = db_trace_self_wrapper+0x2b/frame 0xfffffe02476a0780
kdb_backtrace() at 0xffffffff80956a57 = kdb_backtrace+0x37/frame 
0xfffffe02476a0830
vpanic() at 0xffffffff80907629 = vpanic+0x169/frame 0xfffffe02476a0970
panic() at 0xffffffff80907403 = panic+0x43/frame 0xfffffe02476a09d0
vrefact() at 0xffffffff809f08e4 = vrefact+0xb4/frame 0xfffffe02476a09f0
fgetvp_lookup() at 0xffffffff808ac718 = fgetvp_lookup+0x88/frame 
0xfffffe02476a0a30
namei_setup() at 0xffffffff809e07ba = namei_setup+0x15a/frame 0xfffffe02476a0a80
namei_emptypath() at 0xffffffff809e0499 = namei_emptypath+0x49/frame 0xfffffe02476a0ae0
namei() at 0xffffffff809e029f = namei+0x66f/frame 0xfffffe02476a0b40
linux_kern_statat() at 0xffffffff8a09d24c = linux_kern_statat+0xfc/frame 0xfffffe02476a0c70 linux_newfstatat() at 0xffffffff8a09cfed = linux_newfstatat+0x6d/frame 0xfffffe02476a0e00
amd64_syscall() at 0xffffffff80c79f79 = amd64_syscall+0x189/frame 
0xfffffe02476a0f30
fast_syscall_common() at 0xffffffff80c4fb9b = fast_syscall_common+0xf8/frame 0xfffffe02476a0f30 --- syscall (262, Linux ELF64, linux_newfstatat), rip = 0x813f13eee, rsp = 0x7fffffffbd28, rbp = 0 ---

As far as I understand, there is a Linux fstatat system call with AT_EMPTY_PATH flag and the file descriptor of opened /dev/drm/128 device.

Given that the crash happens in fgetvp_lookup -> vrefact, I think that it's unlikely that there is a problem in that call path. I believe that the problem is elsewhere in the Linux emulation code for working with character devices.

I think that the panic means that the corresponding file descriptor was open but the associated vnode had usecount of zero.

It looks like DTYPE_DEV (11) is used only in the linuxkpi code, e.g., linux_dev_fdopen.

Some info from kgdb.

(kgdb) bt
#0  __curthread () at sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=textdump@entry=1) at sys/kern/kern_shutdown.c:423
#2  0xffffffff80907121 in kern_reboot (howto=260) at 
sys/kern/kern_shutdown.c:541
#3 0xffffffff80907698 in vpanic (fmt=0xffffffff80e35cf8 "%s: wrong use count %d", ap=0xfffffe01adc909b0) at sys/kern/kern_shutdown.c:1021
#4  0xffffffff80907403 in panic (fmt=<unavailable>) at 
sys/kern/kern_shutdown.c:945
#5 0xffffffff809f08e4 in vrefact (vp=0xfffff8035b4bb700) at sys/kern/vfs_subr.c:3361 #6 0xffffffff808ac718 in fgetvp_lookup (ndp=ndp@entry=0xfffffe01adc90b58, vpp=vpp@entry=0xfffffe01adc90ac8) at sys/kern/kern_descrip.c:3134 #7 0xffffffff809e07ba in namei_setup (ndp=ndp@entry=0xfffffe01adc90b58, dpp=dpp@entry=0xfffffe01adc90ac8, pwdp=pwdp@entry=0xfffffe01adc90ac0) at sys/kern/vfs_lookup.c:383 #8 0xffffffff809e0499 in namei_emptypath (ndp=ndp@entry=0xfffffe01adc90b58) at sys/kern/vfs_lookup.c:466 #9 0xffffffff809e029f in namei (ndp=ndp@entry=0xfffffe01adc90b58) at sys/kern/vfs_lookup.c:687 #10 0xffffffff8a09d24c in linux_kern_statat (td=0xfffff804d50d7000, flag=16384, fd=9, path=0x813fd846f <error: Cannot access memory at address 0x813fd846f>, pathseg=UIO_USERSPACE, sbp=sbp@entry=0xfffffe01adc90c80)
    at sys/compat/linux/linux_stats.c:103
#11 0xffffffff8a09cfed in linux_newfstatat (td=<unavailable>, td@entry=<error reading variable: value is not available>, args=0xfffff804d50d7400, args@entry=<error reading variable: value is not available>)
    at sys/compat/linux/linux_stats.c:620
#12 0xffffffff80c79f79 in syscallenter (td=0xfffff804d50d7000) at sys/amd64/amd64/../../kern/subr_syscall.c:191 #13 amd64_syscall (td=0xfffff804d50d7000, traced=<optimized out>) at sys/amd64/amd64/trap.c:1206

(kgdb) p *vp
$1 = {v_type = VCHR, v_state = VSTATE_CONSTRUCTED, v_irflag = 0, v_seqc = 0, v_nchash = 1973399077, v_hash = 56314807, v_op = 0xffffffff8127b648 <devfs_specops>, v_data = 0xfffff80055005200, v_mount = 0xfffffe0150b46100, v_nmntvnodes = { tqe_next = 0xfffff8038000da80, tqe_prev = 0xfffff8035b4bb8e8}, {v_mountedhere = 0xfffff800452b9400, v_unpcb = 0xfffff800452b9400, v_rdev = 0xfffff800452b9400, v_fifoinfo = 0xfffff800452b9400}, v_hashlist = {le_next = 0x0, le_prev = 0x0}, v_cache_src = {lh_first = 0x0}, v_cache_dst = {tqh_first = 0x0, tqh_last = 0xfffff8035b4bb758}, v_cache_dd = 0x0, v_lock = {lock_object = {lo_name = 0xffffffff80d1cf3c "devfs", lo_flags = 116588544, lo_data = 0, lo_witness = 0x0}, lk_lock = 1, lk_exslpfail = 0, lk_pri = 64, lk_timo = 51}, v_interlock = {lock_object = {lo_name = 0xffffffff80db24c1 "vnode interlock", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 0}, v_vnlock = 0xfffff8035b4bb770, v_vnodelist = {tqe_next = 0xfffff8035b4bbc40, tqe_prev = 0xfffff80369f48280}, v_lazylist = {tqe_next = 0x0, tqe_prev = 0x0}, v_bufobj = {bo_lock = {lock_object = {lo_name = 0xffffffff80df4394 "bufobj interlock", lo_flags = 86179840, lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, bo_ops = 0xffffffff812b7190 <buf_ops_bio>, bo_object = 0x0, bo_synclist = {le_next = 0x0, le_prev = 0x0}, bo_private = 0xfffff8035b4bb700, bo_clean = {bv_hd = {tqh_first = 0x0, tqh_last = 0xfffff8035b4bb828}, bv_root = {pt_root = 0x1}, bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = 0xfffff8035b4bb848}, bv_root = {pt_root = 0x1}, bv_cnt = 0}, bo_numoutput = 0, bo_flag = 0, bo_domain = 0, bo_bsize = 512}, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = {tqh_first = 0x0, tqh_last = 0xfffff8035b4bb890}, rl_currdep = 0x0}, v_holdcnt = 32, v_usecount = 1, v_iflag = 0, v_vflag = 0, v_mflag = 0,
  v_dbatchcpu = -1, v_writecount = 0, v_seqc_users = 0}

(kgdb) p *fp
$3 = {f_flag = 3, f_count = 3, f_data = 0xfffff807120b5480, f_ops = 0xffffffff84b46390 <linuxfileops>, f_vnode = 0xfffff8035b4bb700, f_cred = 0xfffff8036f967d00, f_type = 11, f_vnread_flags = 0, {f_seqcount = {0, 0}, f_pipegen = 0}, f_nextoff = {0, 0}, f_vnun = {fvn_cdevpriv = 0x0, fvn_advice = 0x0}, f_offset = 0}

(kgdb) p *ndp
$5 = {ni_dirp = 0x813fd846f <error: Cannot access memory at address 0x813fd846f>, ni_segflg = UIO_USERSPACE, ni_rightsneeded = 0xffffffff812005f0 <cap_fstat_rights>, ni_startdir = 0x0, ni_rootdir = 0xfffff8003a922c40, ni_topdir = 0xfffff8003a922c40, ni_dirfd = 9, ni_lcf = 0, ni_filecaps = {fc_rights = {cr_rights = {144123984168878079, 288230376153808895}}, fc_ioctls = 0x0, fc_nioctls = -1, fc_fcntls = 120}, ni_vp = 0x0, ni_dvp = 0xffffffffffffffff, ni_resflags = 4, ni_debugflags = 3, ni_loopcnt = 0, ni_pathlen = 1, ni_next = 0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>, ni_cnd = {cn_flags = 262596, cn_cred = 0xfffff8068c56b200, cn_nameiop = LOOKUP, cn_lkflags = -1, cn_pnbuf = 0xfffff8002b11ec00 "", cn_nameptr = 0xfffff8002b11ec00 "", cn_namelen = -1}, ni_cap_tracker = {tqh_first = 0x0, tqh_last = 0xfffffe01adc90c08}, ni_dvp_seqc = 2915634432, ni_vp_seqc = 4294966785}

I tried to look at linux_dev_fdopen() and other code in sys/compat/linuxkpi/common/src/linux_compat.c, but couldn't make much progress yet.

I have the crash dump, so if there is anything else I can provide or look at...

Thank you.
--
Andriy Gapon


Reply via email to