https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=244048

            Bug ID: 244048
           Summary: mksnap_ffs hangs machine for several minutes (12.1
                    regression over 11.3)
           Product: Base System
           Version: 12.1-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: b...@freebsd.org
          Reporter: m...@netfence.it

On several servers I manage, I take backups to an external HD and use
mksnap_ffs to create snapshots.
I've never had troubles with this up to 11.3.

Lately I started doing this on a 12.1 server and noticed mksnap_ffs will hang
the box for several minutes (services stuck, no login allowed, already
established ssh sessions partially work; shutdown not feasible unless reset
button is pressed).
N.B. This is an external drive, only mounted when needed and only accessed to
make backups, so if *it* got stuck, it should not affect the whole system.

I decided to check this and took the external HD to my desktop (11.3): it
worked perfectly.
I then upgraded my desktop to 12.1p2 and it started doing as above:

_ mksnap_ffs will work for several minutes under high I/O (the HD is 6TB);
meanwhile, the system is responsive;

_ then mksnap_ffs will drastically reduce its I/O (at least as measured with
top), but will keep working for some other minutes: during this phase, I cannot
open any new program; ThunderBird gets stuck, already open FireFox windows
still works, but I cannot open any new window; audacity keeps playing the
current track, but will get stuck when moving on to the next; already open
terminal windows might partially work;

_ after several minutes mksnap_ffs will exit and everything will get back to
normal.

This is of course unacceptable on a production server.



I built a test machine with 12.1/amd64 with the following kernel options: KDB,
KDB_TRACE, DDB, GDB, INVARIANTS, INVARIANT_SUPPORT, WITNESS, WITNESS_SKIPSPIN,
DEBUG_VFS_LOCKS, LOCK_PROFILING, KTR, ALQ, KTR_ENTRIES=4096.
Such a kernel paniced immediately after launching mksnap_ffs with LOR #269.

I removed WITNESS, WITNESS_SKIPSPIN, issued a "fsck -y" on the disk and tried
again.
This time I got a different panic:
panic: ffs_copyonwrite: bad copy block
cpuid = 0
time = 1581243816
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe001beef0e0
vpanic() at vpanic+0x19d/frame 0xfffffe001beef130
panic() at panic+0x43/frame 0xfffffe001beef190
ffs_copyonwrite() at ffs_copyonwrite+0x74c/frame 0xfffffe001beef230
ffs_geom_strategy() at ffs_geom_strategy+0x8c/frame 0xfffffe001beef260
ufs_strategy() at ufs_strategy+0x83/frame 0xfffffe001beef290
VOP_STRATEGY_APV() at VOP_STRATEGY_APV+0xc9/frame 0xfffffe001beef2c0
bufstrategy() at bufstrategy+0x44/frame 0xfffffe001beef2f0
bufwrite() at bufwrite+0x230/frame 0xfffffe001beef330
ffs_snapshot() at ffs_snapshot+0x8e0/frame 0xfffffe001beef630
ffs_mount() at ffs_mount+0xb3a/frame 0xfffffe001beef7d0
vfs_domount() at vfs_domount+0x8b6/frame 0xfffffe001beef9f0
vfs_donmount() at vfs_donmount+0x7e7/frame 0xfffffe001beefa90
sys_nmount() at sys_nmount+0xf2/frame 0xfffffe001beefac0
amd64_syscall() at amd64_syscall+0x281/frame 0xfffffe001beefbf0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe001beefbf0
--- syscall (378, FreeBSD ELF64, sys_nmount), rip = 0x8002d88ba, rsp =
0x7fffffffd288, rbp = 0x7fffffffeae0 ---
KDB: enter: panic

So, I also removed INVARIANTS and INVARIANT_SUPPORT (and run "fsck -y" twice)
in order to be able to get snapshots.

I haven't been able to collect other data yet.

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"

Reply via email to