Re: Seemingly random nvme (nda) write error on new drive (retries exhausted)
On 6/8/23 00:24, Warner Losh wrote: PCIe 3 or PCIe 4? PCIe 4. nda0 at nvme0 bus 0 scbus0 target 0 lun 1 nda0: nda0: Serial Number S55KNC0TC00168 nda0: nvme version 1.3 x8 (max x8) lanes PCIe Gen4 (max Gen4) link nda0: 6104710MB (12502446768 512 byte sectors) -- Rebecca Cran
Re: Seemingly random nvme (nda) write error on new drive (retries exhausted)
what filesystem? is TRIM enabled on that drive? have you tried disabling trim? i had similar ssd related problem on samsung's ssd long time ago that was related to trim. maybe drive firmware can be updated too? :-) -- CeDeROM, SQ7MHZ, http://www.tomek.cedro.info
Re: Seemingly random nvme (nda) write error on new drive (retries exhausted)
It's ZFS, using the default options when creating it via the FreeBSD installer so I presume TRIM is enabled. Without a reliable way to reproduce the error I'm not sure disabling TRIM will help at the moment. I don't think there's any newer firmware for it. -- Rebecca Cran On 6/8/23 04:25, Tomek CEDRO wrote: what filesystem? is TRIM enabled on that drive? have you tried disabling trim? i had similar ssd related problem on samsung's ssd long time ago that was related to trim. maybe drive firmware can be updated too? :-) -- CeDeROM, SQ7MHZ, http://www.tomek.cedro.info
Re: Seemingly random nvme (nda) write error on new drive (retries exhausted)
On Thu, Jun 8, 2023, 4:35 AM Rebecca Cran wrote: > It's ZFS, using the default options when creating it via the FreeBSD > installer so I presume TRIM is enabled. Without a reliable way to > reproduce the error I'm not sure disabling TRIM will help at the moment. > > I don't think there's any newer firmware for it. > pci gen 4 has a highter error rate so that needs to be managed with retries. There's a whole protocol to do that which linux implements. I suspect the time has come for us to do so too. There's some code floating around I'll have to track down. Warner -- > > Rebecca Cran > > > On 6/8/23 04:25, Tomek CEDRO wrote: > > what filesystem? is TRIM enabled on that drive? have you tried > > disabling trim? i had similar ssd related problem on samsung's ssd > > long time ago that was related to trim. maybe drive firmware can be > > updated too? :-) > > > > -- > > CeDeROM, SQ7MHZ, http://www.tomek.cedro.info > >
Re: Seemingly random nvme (nda) write error on new drive (retries exhausted)
On 6/8/23 05:48, Warner Losh wrote: On Thu, Jun 8, 2023, 4:35 AM Rebecca Cran wrote: It's ZFS, using the default options when creating it via the FreeBSD installer so I presume TRIM is enabled. Without a reliable way to reproduce the error I'm not sure disabling TRIM will help at the moment. I don't think there's any newer firmware for it. pci gen 4 has a highter error rate so that needs to be managed with retries. There's a whole protocol to do that which linux implements. I suspect the time has come for us to do so too. There's some code floating around I'll have to track down. Thanks. I dropped the configuration down to PCIe gen 3 and the errors have so far gone away. nda0: nvme version 1.3 x8 (max x8) lanes PCIe Gen3 (max Gen4) link nda1: nvme version 1.3 x4 (max x4) lanes PCIe Gen3 (max Gen4) link -- Rebecca Cran
CURRRENT snapshot won't boot due missing ZFS feature
Hi, I didn't dig into this yet. After installing the current 14-snapshot (June 1st) in a bhyve-vm, I get this on boot: ZFS: unsupported feature: com.klarasystems:vdev_zaps_v2 (booting stops at this point) Seems like the boot loader is missing this recently added feature. Best Michael -- Michael Gmelin
Re: CURRRENT snapshot won't boot due missing ZFS feature
On Thu, Jun 08, 2023 at 06:11:15PM +0200, Michael Gmelin wrote: > Hi, > > I didn't dig into this yet. > > After installing the current 14-snapshot (June 1st) in a bhyve-vm, I > get this on boot: > > ZFS: unsupported feature: com.klarasystems:vdev_zaps_v2 > > (booting stops at this point) > > Seems like the boot loader is missing this recently added feature. > Can you try today's snapshot? They are propagated to most mirrors now. Glen signature.asc Description: PGP signature
Re: CURRRENT snapshot won't boot due missing ZFS feature
Michael Gmelin wrote: > Hi, > > I didn't dig into this yet. > > After installing the current 14-snapshot (June 1st) in a bhyve-vm, I > get this on boot: > > ZFS: unsupported feature: com.klarasystems:vdev_zaps_v2 > > (booting stops at this point) > > Seems like the boot loader is missing this recently added feature. Are you sure it was June 1's? I saw this problem on: FreeBSD-14.0-CURRENT-amd64-20230427-60167184abd5-262599-disc1.iso ...but it was fixed since (for me, at least): FreeBSD-14.0-CURRENT-amd64-20230504-4194bbb34c60-262746-disc1.iso
Re: CURRRENT snapshot won't boot due missing ZFS feature
Yuri wrote: > Michael Gmelin wrote: >> Hi, >> >> I didn't dig into this yet. >> >> After installing the current 14-snapshot (June 1st) in a bhyve-vm, I >> get this on boot: >> >> ZFS: unsupported feature: com.klarasystems:vdev_zaps_v2 >> >> (booting stops at this point) >> >> Seems like the boot loader is missing this recently added feature. > > Are you sure it was June 1's? I saw this problem on: > > FreeBSD-14.0-CURRENT-amd64-20230427-60167184abd5-262599-disc1.iso > > ...but it was fixed since (for me, at least): > > FreeBSD-14.0-CURRENT-amd64-20230504-4194bbb34c60-262746-disc1.iso Trying to remember, I think I hit "send" too soon, it was 20230504 actually with a problem, and I think I had to use the previous one to install. Sorry for the noise.
Re: CURRRENT snapshot won't boot due missing ZFS feature
On Thu, 8 Jun 2023 16:20:12 + Glen Barber wrote: > On Thu, Jun 08, 2023 at 06:11:15PM +0200, Michael Gmelin wrote: > > Hi, > > > > I didn't dig into this yet. > > > > After installing the current 14-snapshot (June 1st) in a bhyve-vm, I > > get this on boot: > > > > ZFS: unsupported feature: com.klarasystems:vdev_zaps_v2 > > > > (booting stops at this point) > > > > Seems like the boot loader is missing this recently added feature. > > > > Can you try today's snapshot? They are propagated to most mirrors > now. > Tried today's snaphot, same problem. # reboot Waiting (max 60 seconds) for system process `vnlru' to stop... done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining... 0 0 0 0 done All buffers synced. Uptime: 4m14s Consoles: userboot FreeBSD/amd64 User boot lua, Revision 1.2 ZFS: unsupported feature: com.klarasystems:vdev_zaps_v2 ERROR: cannot open /boot/lua/loader.lua: no such file or directory. Type '?' for a list of commands, 'help' for more detailed help. OK That's after installing CURRENT in a fresh vm managed by vm-bhyve using bsdinstall's automatic ZFS option. Best Michael -- Michael Gmelin
OpenSSL 3.0 in the base system update
As previously mentioned[1] FreeBSD 14.0 will include OpenSSL 3.0. We expect to merge the update to main in the near future (within the next week or two) and are ready for wider testing. Supported by the FreeBSD Foundation, Pierre Pronchery has been working on the update in the src tree, with assistance from Enji Cooper (ngie@), and me (emaste@). Thanks to Antoine Brodin (antoine@) and Muhammad Moinur Rahman (bofh@) for ports exp-runs and fixes/workarounds and to Dag-Erling (des@) for updating ldns in the base system. ## Base system compatibility status Most of the base system is ready for a seamless switch to OpenSSL 3.0. For several components we've added `-DOPENSSL_API_COMPAT=0x1010L` to CFLAGS to specify the API version, which avoids deprecation warnings from OpenSSL 3.0. Changes have also been made to avoid OpenSSL APIs already deprecated in OpenSSL 1.1. We can continue the process of updating to contemporary APIs after OpenSSL 3.0 is in the tree. Additional changes are still required for libarchive and seven Kerberos-related libraries or tools. Workarounds are ready to go along with the OpenSSL 3 import, and proper fixes are in progress in the upstream projects. A segfault from `openssl x509` in the i386 ports exp-run is under investigation and needs to be addressed prior to the merge. ## Ports compatibility With bofh@'s recent www/node18 and www/node20 patches the ports tree is in reasonable shape for OpenSSL 3.0 in the base system. The exp-run (link below) has a list of the failing ports, and I've emailed all of the maintainers as a heads-up. None of the remaining failures are responsible for a large number of skipped ports (i.e., the failures are either leaf ports or are responsible for only a small number of skipped ports). I expect that some or many of these will need to be addressed after the change lands in the src tree. ## Call for testing We welcome feedback from anyone willing to test the work in progress. Pierre's update can be obtained from the pull request[2] or by fetching the branch[3]. If desired I will provide a large diff against main. ## Links - Base system OpenSSL 3.0 update tracking PR: https://bugs.freebsd.org/271615 - Ports exp-run with OpenSSL 3.0 in the base system: https://bugs.freebsd.org/271656 [1] https://lists.freebsd.org/archives/freebsd-current/2023-May/003609.html [2] https://github.com/freebsd/freebsd-src/pull/760 [3] https://github.com/khorben/freebsd-src/tree/khorben/openssl-3.0.9
Re: CURRRENT snapshot won't boot due missing ZFS feature
On Thu, 8 Jun 2023 19:06:23 +0200 Michael Gmelin wrote: > On Thu, 8 Jun 2023 16:20:12 + > Glen Barber wrote: > > > On Thu, Jun 08, 2023 at 06:11:15PM +0200, Michael Gmelin wrote: > > > Hi, > > > > > > I didn't dig into this yet. > > > > > > After installing the current 14-snapshot (June 1st) in a > > > bhyve-vm, I get this on boot: > > > > > > ZFS: unsupported feature: com.klarasystems:vdev_zaps_v2 > > > > > > (booting stops at this point) > > > > > > Seems like the boot loader is missing this recently added feature. > > > > > > > Can you try today's snapshot? They are propagated to most mirrors > > now. > > > > Tried today's snaphot, same problem. > > # reboot > Waiting (max 60 seconds) for system process `vnlru' to stop... done > Waiting (max 60 seconds) for system process `syncer' to stop... > Syncing disks, vnodes remaining... 0 0 0 0 done > All buffers synced. > Uptime: 4m14s > Consoles: userboot > > FreeBSD/amd64 User boot lua, Revision 1.2 > ZFS: unsupported feature: com.klarasystems:vdev_zaps_v2 > ERROR: cannot open /boot/lua/loader.lua: no such file or directory. > > > Type '?' for a list of commands, 'help' for more detailed help. > OK > > > That's after installing CURRENT in a fresh vm managed by vm-bhyve > using bsdinstall's automatic ZFS option. > Thinking about this, it's possible that vm-bhyve is using the zfs boot loader from the host machine. Please consider this noise, unless you hear from me again. Best Michael -- Michael Gmelin
Re: CURRRENT snapshot won't boot due missing ZFS feature
On Thu, Jun 8, 2023, 11:18 AM Michael Gmelin wrote: > > > On Thu, 8 Jun 2023 19:06:23 +0200 > Michael Gmelin wrote: > > > On Thu, 8 Jun 2023 16:20:12 + > > Glen Barber wrote: > > > > > On Thu, Jun 08, 2023 at 06:11:15PM +0200, Michael Gmelin wrote: > > > > Hi, > > > > > > > > I didn't dig into this yet. > > > > > > > > After installing the current 14-snapshot (June 1st) in a > > > > bhyve-vm, I get this on boot: > > > > > > > > ZFS: unsupported feature: com.klarasystems:vdev_zaps_v2 > > > > > > > > (booting stops at this point) > > > > > > > > Seems like the boot loader is missing this recently added feature. > > > > > > > > > > Can you try today's snapshot? They are propagated to most mirrors > > > now. > > > > > > > Tried today's snaphot, same problem. > > > > # reboot > > Waiting (max 60 seconds) for system process `vnlru' to stop... done > > Waiting (max 60 seconds) for system process `syncer' to stop... > > Syncing disks, vnodes remaining... 0 0 0 0 done > > All buffers synced. > > Uptime: 4m14s > > Consoles: userboot > > > > FreeBSD/amd64 User boot lua, Revision 1.2 > > ZFS: unsupported feature: com.klarasystems:vdev_zaps_v2 > > ERROR: cannot open /boot/lua/loader.lua: no such file or directory. > > > > > > Type '?' for a list of commands, 'help' for more detailed help. > > OK > > > > > > That's after installing CURRENT in a fresh vm managed by vm-bhyve > > using bsdinstall's automatic ZFS option. > > > > Thinking about this, it's possible that vm-bhyve is using the zfs boot > loader from the host machine. > > Please consider this noise, unless you hear from me again. > Yes. It does. This can be an unfortunate design choice at times. Warner Best > Michael > > -- > Michael Gmelin > >
panic(s) in ZFS on CURRENT
Hi, I got several panics on my desktop running eb2b00da564 which is after the latest OpenZFS merge. #1 (couple cores with this backtrace) --- trap 0x9, rip = 0x803ab94b, rsp = 0xfe022e45ed30, rbp = 0xfe022e45ed50 --- buf_hash_insert() at buf_hash_insert+0xab/frame 0xfe022e45ed50 arc_write_done() at arc_write_done+0xfa/frame 0xfe022e45ed90 zio_done() at zio_done+0xf0b/frame 0xfe022e45ee00 zio_execute() at zio_execute+0x9f/frame 0xfe022e45ee40 taskqueue_run_locked() at taskqueue_run_locked+0x181/frame 0xfe022e45eec0 taskqueue_thread_loop() at taskqueue_thread_loop+0xc3/frame 0xfe022e45eef0 fork_exit() at fork_exit+0x7d/frame 0xfe022e45ef30 fork_trampoline() at fork_trampoline+0xe/frame 0xfe022e45ef30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- (kgdb) frame 7 #7 buf_hash_insert (hdr=hdr@entry=0xf8001b21fa28, lockp=lockp@entry=0xfe022e45ed60) at /usr/src/FreeBSD/sys/contrib/openzfs/module/zfs/arc.c:1062 1062if (HDR_EQUAL(hdr->b_spa, &hdr->b_dva, hdr->b_birth, fhdr)) (kgdb) p hdr $1 = (arc_buf_hdr_t *) 0xf8001b21fa28 (kgdb) p *hdr $2 = {b_dva = {dva_word = {16, 20406677952}}, b_birth = 38447120, b_type = ARC_BUFC_METADATA, b_complevel = 255 '\377', b_reserved1 = 0 '\000', b_reserved2 = 0, b_hash_next = 0x0, b_flags = (ARC_FLAG_L2CACHE | ARC_FLAG_IO_IN_PROGRESS | ARC_FLAG_BUFC_METADATA | ARC_FLAG_HAS_L1HDR | ARC_FLAG_COMPRESSED_ARC | ARC_FLAG_COMPRESS_0 | ARC_FLAG_COMPRESS_1 | ARC_FLAG_COMPRESS_2 | ARC_FLAG_COMPRESS_3), b_psize = 8, b_lsize = 32, b_spa = 1230587331341359116, b_l2hdr = { b_dev = 0x0, b_daddr = 0, b_hits = 0, b_arcs_state = ARC_STATE_ANON, b_l2node = {list_next = 0x0, list_prev = 0x0}}, b_l1hdr = {b_cv = { cv_description = 0x80bb5b02 "hdr->b_l1hdr.b_cv", cv_waiters = 0}, b_byteswap = 10 '\n', b_state = 0x80ef23c0 , b_arc_node = {list_next = 0x0, list_prev = 0x0}, b_arc_access = 0, b_mru_hits = 0, b_mru_ghost_hits = 0, b_mfu_hits = 0, b_mfu_ghost_hits = 0, b_bufcnt = 1, b_buf = 0xf80003139d80, b_refcnt = {rc_count = 2}, b_acb = 0x0, b_pabd = 0xf80a35dc6480}, b_crypt_hdr = {b_rabd = 0x10, b_ot = 2744191968, b_ebufcnt = 4, b_dsobj = 38340866, b_salt = "\001\000\000\000\000\000\000", b_iv = "\000\000\000\000\000\000\000\000\220\000\026\017", b_mac = "\b\000 \000\f\230\262m\250\354\023\021\000\000\000"}} #2 (single core) --- trap 0x9, rip = 0x803ab94b, rsp = 0xfe0256158780, rbp = 0xfe02561587a0 --- buf_hash_insert() at buf_hash_insert+0xab/frame 0xfe02561587a0 arc_hdr_realloc() at arc_hdr_realloc+0x138/frame 0xfe0256158800 arc_read() at arc_read+0x2dc/frame 0xfe02561588b0 dbuf_read() at dbuf_read+0xb3e/frame 0xfe02561589f0 dmu_buf_hold() at dmu_buf_hold+0x46/frame 0xfe0256158a30 zap_cursor_retrieve() at zap_cursor_retrieve+0x167/frame 0xfe0256158a90 zfs_freebsd_readdir() at zfs_freebsd_readdir+0x383/frame 0xfe0256158cc0 VOP_READDIR_APV() at VOP_READDIR_APV+0x1f/frame 0xfe0256158ce0 kern_getdirentries() at kern_getdirentries+0x186/frame 0xfe0256158dd0 sys_getdirentries() at sys_getdirentries+0x29/frame 0xfe0256158e00 amd64_syscall() at amd64_syscall+0x100/frame 0xfe0256158f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe0256158f30 (kgdb) frame 7 #7 buf_hash_insert (hdr=0xf80c906b03e8, lockp=lockp@entry=0x0) at /usr/src/FreeBSD/sys/contrib/openzfs/module/zfs/arc.c:1062 1062if (HDR_EQUAL(hdr->b_spa, &hdr->b_dva, hdr->b_birth, fhdr)) (kgdb) p *hdr $1 = {b_dva = {dva_word = {16, 19965896928}}, b_birth = 36629088, b_type = ARC_BUFC_METADATA, b_complevel = 0 '\000', b_reserved1 = 0 '\000', b_reserved2 = 0, b_hash_next = 0x0, b_flags = (ARC_FLAG_BUFC_METADATA | ARC_FLAG_HAS_L1HDR | ARC_FLAG_HAS_L2HDR | ARC_FLAG_COMPRESSED_ARC | ARC_FLAG_COMPRESS_1), b_psize = 5, b_lsize = 5, b_spa = 3583499065027950438, b_l2hdr = {b_dev = 0xfe02306c8000, b_daddr = 4917395456, b_hits = 0, b_arcs_state = ARC_STATE_MRU, b_l2node = {list_next = 0xf801313fc9b0, list_prev = 0xf801313fca70}}, b_l1hdr = {b_cv = { cv_description = 0x80bb5b02 "hdr->b_l1hdr.b_cv", cv_waiters = 0}, b_byteswap = 10 '\n', b_state = 0x80f02900 , b_arc_node = {list_next = 0x0, list_prev = 0x0}, b_arc_access = 0, b_mru_hits = 0, b_mru_ghost_hits = 0, b_mfu_hits = 0, b_mfu_ghost_hits = 0, b_bufcnt = 0, b_buf = 0x0, b_refcnt = {rc_count = 0}, b_acb = 0x0, b_pabd = 0x0}, b_crypt_hdr = {b_rabd = 0x10, b_ot = 2786027712, b_ebufcnt = 4, b_dsobj = 36629088, b_salt = "\001\000\000\000\000\000\000", b_iv = "\240\2769$\001\370\377\377\220\000\036\017", b_mac = "\b\000 \000fw\357\327i%\2731\000\200l0"}} #3 (not ZFS, but VFS, could be related?) --- trap 0x9, rip = 0x80801408, rsp = 0xfe02348cbcc0, rbp = 0xfe02348cbcf0 --- pwd_chdir() at pwd_chdir+0x28/frame 0xfe02348cbcf0 kern_chdir() at kern_chdir+0x169/frame 0
Re: panic(s) in ZFS on CURRENT
On Thu, Jun 08, 2023 at 07:56:07PM -0700, Gleb Smirnoff wrote: T> I'm switching to INVARIANTS kernel right now and will see if that panics earlier. This is what I got with INVARIANTS: panic: VERIFY3(dev->l2ad_hand <= dev->l2ad_evict) failed (225142071296 <= 225142063104) cpuid = 17 time = 1686286015 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame 0xfe0160dcea90 kdb_backtrace() at kdb_backtrace+0x46/frame 0xfe0160dceb40 vpanic() at vpanic+0x21f/frame 0xfe0160dcebe0 spl_panic() at spl_panic+0x4d/frame 0xfe0160dcec60 l2arc_write_buffers() at l2arc_write_buffers+0xcda/frame 0xfe0160dcedf0 l2arc_feed_thread() at l2arc_feed_thread+0x547/frame 0xfe0160dceec0 fork_exit() at fork_exit+0x122/frame 0xfe0160dcef30 fork_trampoline() at fork_trampoline+0xe/frame 0xfe0160dcef30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Uptime: 1m4s Dumping 5473 out of 65308 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% (kgdb) frame 4 #4 0x804342ea in l2arc_write_buffers (spa=0xfe022e942000, dev=0xfe023116a000, target_sz=16777216) at /usr/src/FreeBSD/sys/contrib/openzfs/module/zfs/arc.c:9445 9445ASSERT3U(dev->l2ad_hand, <=, dev->l2ad_evict); (kgdb) p dev $1 = (l2arc_dev_t *) 0xfe023116a000 (kgdb) p dev->l2ad_hand $2 = 225142071296 (kgdb) p dev->l2ad_evict $3 = 225142063104 (kgdb) p *dev value of type `l2arc_dev_t' requires 66136 bytes, which is more than max-value-size Never seen kgdb not being able to print a structure that reported to be too big. -- Gleb Smirnoff
CPU hog on -current (pfSense 23.05)
Hi, since cca 7 days I’ve started to observe a CPU hog of one CPU core on APU2 box running pfSense 23.05 (if that matters). mjg@ suggested to run a dtrace oneliner that showed: ~~ kernel`pmap_copy 33 kernel`amd64_syscall 33 kernel`vm_radix_insert 35 kernel`vm_map_pmap_enter 37 kernel`vm_radix_lookup_unlocked 38 kernel`memcpy_std 38 kernel`vm_object_deallocate 39 kernel`pmap_enter_quick_locked 41 kernel`em_update_stats_counters 43 kernel`copyout_nosmap_std 43 kernel`ck_epoch_poll_deferred 44 kernel`sbuf_put_bytes 46 kernel`vm_page_pqbatch_submit 48 kernel`pmap_remove_pte 51 kernel`pmap_pvh_remove 53 kernel`vm_pqbatch_process_page 54 kernel`cpu_search_highest 56 kernel`get_pv_entry 57 kernel`pmap_try_insert_pv_entry 59 kernel`vm_map_lookup_entry 65 kernel`epoch_call_task 92 kernel`pmap_enter 101 kernel`vm_fault 110 kernel`pagecopy 110 kernel`0x81 133 kernel`lock_delay 145 kernel`pmap_remove_pages 203 kernel`_thread_lock 415 kernel`pagezero_std 490 kernel`assert_rw 532 kernel`acpi_cpu_c1 600 kernel`callout_lock 641 kernel`kern_yield1206 kernel`_callout_stop_safe2010 kernel`spinlock_enter2032 kernel`tcp_timer_stop2703 0x05927 kernel`spinlock_exit 40964 kernel`cpu_idle 61943 kernel`sched_idletd 76722 ~~~ The symptom is that the kernel thread "kernel{if_io_tqg_1}” consumes 100% of the CPU core. Can this be debugged any further? I also suspect the hardware problem (I have one spare box where I’ll put my XML config and test whether the problem will persist). I’d be very thankful for any pointers. Thanks! otis — Juraj Lutter o...@freebsd.org