Hello everyone, We have been using IP over IB in connected mode between a Linux machine running Void Linux and another machine running FreeBSD 12.1 STABLE. After having initially transferred data at expected speeds, about 5 Gbit/s, and letting the computers rest for a while the FreeBSD machine throws transmission timeout errors. When a new data transfer is started, the machine would complain that it cannot send a few packets because of them being too large. After this the kernel would panic. See example logs below:
Timing out: > ib0: timing out; 7 sends not completed When starting new transfers: > ib0: packet len 32812 (> 2044) too long to send, dropping > ib0: packet len 8248 (> 2044) too long to send, dropping Kernel crash: > Fatal trap 12: page fault while in kernel mode > cpuid = 3; apic id = 03 > fault virtual address = 0x28 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0xffffffff80d76edf > stack pointer = 0x28:0xfffffe008edbeb50 > frame pointer = 0x28:0xfffffe008edbebb0 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 0 (ipoib) > trap number = 12 > panic: page fault > cpuid = 3 > time = 1578710936 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe008edbe7b0 > vpanic() at vpanic+0x17e/frame 0xfffffe008edbe810 > panic() at panic+0x43/frame 0xfffffe008edbe870 > trap_pfault() at trap_pfault/frame 0xfffffe008edbe8e0 > trap_pfault() at trap_pfault+0x4f/frame 0xfffffe008edbe950 > trap() at trap+0x288/frame 0xfffffe008edbea80 > calltrap() at calltrap+0x8/frame 0xfffffe008edbea80 > --- trap 0xc, rip = 0xffffffff80d76edf, rsp = 0xfffffe008edbeb50, rbp = > 0xfffffe008edbebb0 --- > icmp_error() at icmp_error+0x2f/frame 0xfffffe008edbebb0 > ipoib_cm_mb_reap() at ipoib_cm_mb_reap+0x154/frame 0xfffffe008edbec00 > linux_work_fn() at linux_work_fn+0xfc/frame 0xfffffe008edbec60 > taskqueue_run_locked() at taskqueue_run_locked+0x144/frame 0xfffffe008edbecc0 > taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe008edbecf0 > fork_exit() at fork_exit+0x7e/frame 0xfffffe008edbed30 > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe008edbed30 > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- > KDB: enter: panic The 0x28 access that causes the trap is caused by the error statistics if statement at the top of icmp_error in sys/netinet/ip_icmp.c: > if (type != ICMP_REDIRECT) > ICMPSTAT_INC(icps_error); ICMPSTAT_INC needs the VIMAGE for the current thread to be set. Its calling function, i.e. ipoib_cm_mb_reap in sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_cm.c, is scheduled in its own thread when the MTU size is too large in ipoib_cm_send. It then calls ipoib_cm_mb_too_long, which in turn schedules ipoib_cm_mb_reap (both functions are located in sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_cm.c). The attached patch fixes the issue by setting the VIMAGE for the thread in ipoib_cm_mb_reap. We still have not investigated what causes the MTU to be perceived as too large, but our machine stopped crashing after applying the patch. Cordially, Andreas Kempe
Index: sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_cm.c =================================================================== --- sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_cm.c (revision 356611) +++ sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_cm.c (working copy) @@ -1265,6 +1265,8 @@ spin_lock_irqsave(&priv->lock, flags); + CURVNET_SET_QUIET(priv->dev->if_vnet); + for (;;) { IF_DEQUEUE(&priv->cm.mb_queue, mb); if (mb == NULL) @@ -1291,6 +1293,8 @@ spin_lock_irqsave(&priv->lock, flags); } + CURVNET_RESTORE(); + spin_unlock_irqrestore(&priv->lock, flags); }
signature.asc
Description: PGP signature