Introduction.
The affected system is stable/14, amd64.
The kernel is custom, it's configured with INVARIANTS.
The problem started to happen rather reliably after a recent upgrade of
packages. I suspect that the trigger is in linux-nvidia-libs-570.124.04, but
the bug is in FreeBSD Linux emulation.
The reason for my suspicion is that the crash happens when starting a graphical
Linux application in a Linux jail. And the crash involves a graphics-related
character device.
Just in case, the jail itself, including the application, hasn't been changed.
Also, I haven't touched the base system recently.
Details.
VNASSERT failed: old > 0 not true at sys/kern/vfs_subr.c:3361 (vrefact)
0xfffff802945df380: type VCHR state VSTATE_CONSTRUCTED op 0xffffffff8127b648
usecount 1, writecount 0, refcount 39 seqc users 0 rdev 0xfffff8004565f400
hold count flags ()
flags ()
lock type devfs: UNLOCKED
dev drm/128
panic: vrefact: wrong use count 0
cpuid = 1
time = 1742796535
KDB: stack backtrace:
db_trace_self_wrapper() at 0xffffffff8061eadb = db_trace_self_wrapper+0x2b/frame
0xfffffe02476a0780
kdb_backtrace() at 0xffffffff80956a57 = kdb_backtrace+0x37/frame
0xfffffe02476a0830
vpanic() at 0xffffffff80907629 = vpanic+0x169/frame 0xfffffe02476a0970
panic() at 0xffffffff80907403 = panic+0x43/frame 0xfffffe02476a09d0
vrefact() at 0xffffffff809f08e4 = vrefact+0xb4/frame 0xfffffe02476a09f0
fgetvp_lookup() at 0xffffffff808ac718 = fgetvp_lookup+0x88/frame
0xfffffe02476a0a30
namei_setup() at 0xffffffff809e07ba = namei_setup+0x15a/frame 0xfffffe02476a0a80
namei_emptypath() at 0xffffffff809e0499 = namei_emptypath+0x49/frame
0xfffffe02476a0ae0
namei() at 0xffffffff809e029f = namei+0x66f/frame 0xfffffe02476a0b40
linux_kern_statat() at 0xffffffff8a09d24c = linux_kern_statat+0xfc/frame
0xfffffe02476a0c70
linux_newfstatat() at 0xffffffff8a09cfed = linux_newfstatat+0x6d/frame
0xfffffe02476a0e00
amd64_syscall() at 0xffffffff80c79f79 = amd64_syscall+0x189/frame
0xfffffe02476a0f30
fast_syscall_common() at 0xffffffff80c4fb9b = fast_syscall_common+0xf8/frame
0xfffffe02476a0f30
--- syscall (262, Linux ELF64, linux_newfstatat), rip = 0x813f13eee, rsp =
0x7fffffffbd28, rbp = 0 ---
As far as I understand, there is a Linux fstatat system call with AT_EMPTY_PATH
flag and the file descriptor of opened /dev/drm/128 device.
Given that the crash happens in fgetvp_lookup -> vrefact, I think that it's
unlikely that there is a problem in that call path.
I believe that the problem is elsewhere in the Linux emulation code for working
with character devices.
I think that the panic means that the corresponding file descriptor was open but
the associated vnode had usecount of zero.
It looks like DTYPE_DEV (11) is used only in the linuxkpi code, e.g.,
linux_dev_fdopen.
Some info from kgdb.
(kgdb) bt
#0 __curthread () at sys/amd64/include/pcpu_aux.h:57
#1 doadump (textdump=textdump@entry=1) at sys/kern/kern_shutdown.c:423
#2 0xffffffff80907121 in kern_reboot (howto=260) at
sys/kern/kern_shutdown.c:541
#3 0xffffffff80907698 in vpanic (fmt=0xffffffff80e35cf8 "%s: wrong use count
%d", ap=0xfffffe01adc909b0) at sys/kern/kern_shutdown.c:1021
#4 0xffffffff80907403 in panic (fmt=<unavailable>) at
sys/kern/kern_shutdown.c:945
#5 0xffffffff809f08e4 in vrefact (vp=0xfffff8035b4bb700) at
sys/kern/vfs_subr.c:3361
#6 0xffffffff808ac718 in fgetvp_lookup (ndp=ndp@entry=0xfffffe01adc90b58,
vpp=vpp@entry=0xfffffe01adc90ac8) at sys/kern/kern_descrip.c:3134
#7 0xffffffff809e07ba in namei_setup (ndp=ndp@entry=0xfffffe01adc90b58,
dpp=dpp@entry=0xfffffe01adc90ac8, pwdp=pwdp@entry=0xfffffe01adc90ac0) at
sys/kern/vfs_lookup.c:383
#8 0xffffffff809e0499 in namei_emptypath (ndp=ndp@entry=0xfffffe01adc90b58) at
sys/kern/vfs_lookup.c:466
#9 0xffffffff809e029f in namei (ndp=ndp@entry=0xfffffe01adc90b58) at
sys/kern/vfs_lookup.c:687
#10 0xffffffff8a09d24c in linux_kern_statat (td=0xfffff804d50d7000, flag=16384,
fd=9, path=0x813fd846f <error: Cannot access memory at address 0x813fd846f>,
pathseg=UIO_USERSPACE, sbp=sbp@entry=0xfffffe01adc90c80)
at sys/compat/linux/linux_stats.c:103
#11 0xffffffff8a09cfed in linux_newfstatat (td=<unavailable>, td@entry=<error
reading variable: value is not available>, args=0xfffff804d50d7400,
args@entry=<error reading variable: value is not available>)
at sys/compat/linux/linux_stats.c:620
#12 0xffffffff80c79f79 in syscallenter (td=0xfffff804d50d7000) at
sys/amd64/amd64/../../kern/subr_syscall.c:191
#13 amd64_syscall (td=0xfffff804d50d7000, traced=<optimized out>) at
sys/amd64/amd64/trap.c:1206
(kgdb) p *vp
$1 = {v_type = VCHR, v_state = VSTATE_CONSTRUCTED, v_irflag = 0, v_seqc = 0,
v_nchash = 1973399077, v_hash = 56314807, v_op = 0xffffffff8127b648
<devfs_specops>, v_data = 0xfffff80055005200, v_mount = 0xfffffe0150b46100,
v_nmntvnodes = {
tqe_next = 0xfffff8038000da80, tqe_prev = 0xfffff8035b4bb8e8},
{v_mountedhere = 0xfffff800452b9400, v_unpcb = 0xfffff800452b9400, v_rdev =
0xfffff800452b9400, v_fifoinfo = 0xfffff800452b9400}, v_hashlist = {le_next =
0x0, le_prev = 0x0},
v_cache_src = {lh_first = 0x0}, v_cache_dst = {tqh_first = 0x0, tqh_last =
0xfffff8035b4bb758}, v_cache_dd = 0x0, v_lock = {lock_object = {lo_name =
0xffffffff80d1cf3c "devfs", lo_flags = 116588544, lo_data = 0, lo_witness = 0x0},
lk_lock = 1, lk_exslpfail = 0, lk_pri = 64, lk_timo = 51}, v_interlock =
{lock_object = {lo_name = 0xffffffff80db24c1 "vnode interlock", lo_flags =
16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 0}, v_vnlock =
0xfffff8035b4bb770,
v_vnodelist = {tqe_next = 0xfffff8035b4bbc40, tqe_prev = 0xfffff80369f48280},
v_lazylist = {tqe_next = 0x0, tqe_prev = 0x0}, v_bufobj = {bo_lock =
{lock_object = {lo_name = 0xffffffff80df4394 "bufobj interlock", lo_flags =
86179840,
lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, bo_ops =
0xffffffff812b7190 <buf_ops_bio>, bo_object = 0x0, bo_synclist = {le_next = 0x0,
le_prev = 0x0}, bo_private = 0xfffff8035b4bb700, bo_clean = {bv_hd = {tqh_first
= 0x0,
tqh_last = 0xfffff8035b4bb828}, bv_root = {pt_root = 0x1}, bv_cnt = 0},
bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = 0xfffff8035b4bb848}, bv_root =
{pt_root = 0x1}, bv_cnt = 0}, bo_numoutput = 0, bo_flag = 0, bo_domain = 0,
bo_bsize = 512}, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl =
{rl_waiters = {tqh_first = 0x0, tqh_last = 0xfffff8035b4bb890}, rl_currdep =
0x0}, v_holdcnt = 32, v_usecount = 1, v_iflag = 0, v_vflag = 0, v_mflag = 0,
v_dbatchcpu = -1, v_writecount = 0, v_seqc_users = 0}
(kgdb) p *fp
$3 = {f_flag = 3, f_count = 3, f_data = 0xfffff807120b5480, f_ops =
0xffffffff84b46390 <linuxfileops>, f_vnode = 0xfffff8035b4bb700, f_cred =
0xfffff8036f967d00, f_type = 11, f_vnread_flags = 0, {f_seqcount = {0, 0},
f_pipegen = 0},
f_nextoff = {0, 0}, f_vnun = {fvn_cdevpriv = 0x0, fvn_advice = 0x0}, f_offset
= 0}
(kgdb) p *ndp
$5 = {ni_dirp = 0x813fd846f <error: Cannot access memory at address
0x813fd846f>, ni_segflg = UIO_USERSPACE, ni_rightsneeded = 0xffffffff812005f0
<cap_fstat_rights>, ni_startdir = 0x0, ni_rootdir = 0xfffff8003a922c40,
ni_topdir = 0xfffff8003a922c40, ni_dirfd = 9, ni_lcf = 0, ni_filecaps =
{fc_rights = {cr_rights = {144123984168878079, 288230376153808895}}, fc_ioctls =
0x0, fc_nioctls = -1, fc_fcntls = 120}, ni_vp = 0x0, ni_dvp = 0xffffffffffffffff,
ni_resflags = 4, ni_debugflags = 3, ni_loopcnt = 0, ni_pathlen = 1, ni_next =
0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>,
ni_cnd = {cn_flags = 262596, cn_cred = 0xfffff8068c56b200, cn_nameiop = LOOKUP,
cn_lkflags = -1, cn_pnbuf = 0xfffff8002b11ec00 "", cn_nameptr =
0xfffff8002b11ec00 "", cn_namelen = -1}, ni_cap_tracker = {tqh_first = 0x0,
tqh_last = 0xfffffe01adc90c08}, ni_dvp_seqc = 2915634432, ni_vp_seqc = 4294966785}
I tried to look at linux_dev_fdopen() and other code in
sys/compat/linuxkpi/common/src/linux_compat.c, but couldn't make much progress yet.
I have the crash dump, so if there is anything else I can provide or look at...
Thank you.
--
Andriy Gapon