On Sat, Sep 27, 2014 at 7:57 AM, Alexander V. Chernikov <melif...@freebsd.org> wrote: > Author: melifaro > Date: Sat Sep 27 13:57:48 2014 > New Revision: 272211 > URL: http://svnweb.freebsd.org/changeset/base/272211 > > Log: > Use underlying ports counters to get lagg statistics instead of > per-packet accounting. > This introduce user-visible changes like aggregating error counters. > > Reviewed by: asomers (prev.version), glebius > CR: D781 > MFC after: 2 weeks > Sponsored by: Yandex LLC > > Modified: > head/sys/net/if_lagg.c > head/sys/net/if_lagg.h > head/sys/net/if_var.h >
I think this change is causing a LOR and deadlock. It happens if I create a lagg and then quickly destroy it. The deadlocked threads have these stack traces: Tracing command ifconfig pid 7334 tid 100823 td 0xfffff8014ff34000 sched_switch() at sched_switch+0x48a/frame 0xfffffe20b3771470 mi_switch() at mi_switch+0x167/frame 0xfffffe20b37714a0 turnstile_wait() at turnstile_wait+0x3be/frame 0xfffffe20b37714f0 __mtx_lock_sleep() at __mtx_lock_sleep+0x196/frame 0xfffffe20b3771570 __mtx_lock_flags() at __mtx_lock_flags+0x10d/frame 0xfffffe20b37715c0 _rm_rlock() at _rm_rlock+0x28b/frame 0xfffffe20b3771600 _rm_rlock_debug() at _rm_rlock_debug+0x11f/frame 0xfffffe20b3771640 lagg_get_counter() at lagg_get_counter+0x4c/frame 0xfffffe20b37716c0 if_data_copy() at if_data_copy+0xa1/frame 0xfffffe20b37716e0 sysctl_rtsock() at sysctl_rtsock+0x56c/frame 0xfffffe20b3771860 sysctl_root_handler_locked() at sysctl_root_handler_locked+0x8a/frame 0xfffffe20b37718a0 sysctl_root() at sysctl_root+0x188/frame 0xfffffe20b3771920 userland_sysctl() at userland_sysctl+0x16e/frame 0xfffffe20b37719c0 sys___sysctl() at sys___sysctl+0x74/frame 0xfffffe20b3771a70 amd64_syscall() at amd64_syscall+0x314/frame 0xfffffe20b3771bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe20b3771bf0 --- syscall (202, FreeBSD ELF64, sys___sysctl), rip = 0x800fceeea, rsp = 0x7fffffffe408, rbp = 0x7fffffffe440 --- Tracing command ifconfig pid 7331 tid 100796 td 0xfffff80066df5a00 sched_switch() at sched_switch+0x48a/frame 0xfffffe20b36ea630 mi_switch() at mi_switch+0x167/frame 0xfffffe20b36ea660 turnstile_wait() at turnstile_wait+0x3be/frame 0xfffffe20b36ea6b0 __rw_wlock_hard() at __rw_wlock_hard+0xb5/frame 0xfffffe20b36ea740 _rw_wlock_cookie() at _rw_wlock_cookie+0xbc/frame 0xfffffe20b36ea780 lagg_ether_cmdmulti() at lagg_ether_cmdmulti+0x5c/frame 0xfffffe20b36ea7c0 lagg_ioctl() at lagg_ioctl+0x115a/frame 0xfffffe20b36ea8a0 ifioctl() at ifioctl+0xdc1/frame 0xfffffe20b36ea930 kern_ioctl() at kern_ioctl+0x246/frame 0xfffffe20b36ea990 sys_ioctl() at sys_ioctl+0x171/frame 0xfffffe20b36eaa70 amd64_syscall() at amd64_syscall+0x314/frame 0xfffffe20b36eabf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe20b36eabf0 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800fd417a, rsp = 0x7fffffffe228, rbp = 0x7fffffffe2a0 --- The problem is that lagg_get_counter calls LAGG_RLOCK after calling IF_ADDR_RLOCK at rtsock.c:1717. Meanwhile, another thread called IF_ADDR_WLOCK at if_lagg.c:1581 after having already called LAGG_WLOCK at f_lagg.c:1530. I think this revision introduced the problem because reading the lagg's counters did not previously require the LAGG_RLOCK. Do you have any ideas on how to fix it? -Alan _______________________________________________ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"