Patching ng_iface to allow setting the MTU via netgraph API
Hello! I am working on a meshnet concept and am using netgraph to put it right on top of the MAC layer. I am using ng_iface to do some tunneling of IP over my protocol and I thought it would be nice to be able to set the MTU of the created interface using the netgraph interface. I created a patch to do just that and thought I could share it if someone would find it worthwhile using for something, but I couldn't really figure out where to share it. I couldn't really get clarity from the contribution section in the handbook. If someone could point me in the right direction, it would be appreciated. I'll attach the patch to this mail as well since it is a quite small one. Cordially, Andreas Kempe Index: share/man/man4/ng_iface.4 === --- share/man/man4/ng_iface.4 (revision 338702) +++ share/man/man4/ng_iface.4 (arbetskopia) @@ -111,6 +111,8 @@ .It Dv NGM_IFACE_BROADCAST Pq Ic broadcast Set the interface to broadcast mode. The interface must not currently be up. +.It Dv NGM_IFACE_SET_MTU Pq Ic setmtu +Set the MTU of the interface. Given as a 16 bit unsigned integer. .El .Sh SHUTDOWN This node shuts down upon receipt of a Index: sys/netgraph/ng_iface.c === --- sys/netgraph/ng_iface.c (revision 338702) +++ sys/netgraph/ng_iface.c (arbetskopia) @@ -181,6 +181,13 @@ NULL, &ng_parse_uint32_type }, + { + NGM_IFACE_COOKIE, + NGM_IFACE_SET_MTU, + "setmtu", + &ng_parse_uint16_type, + NULL + }, { 0 } }; @@ -601,6 +608,7 @@ struct ng_mesg *resp = NULL; int error = 0; struct ng_mesg *msg; + struct ifreq ifr; NGI_GET_MSG(item, msg); switch (msg->header.typecookie) { @@ -646,6 +654,13 @@ *((uint32_t *)resp->data) = priv->ifp->if_index; break; + case NGM_IFACE_SET_MTU: + { + ifr.ifr_mtu = *((uint16_t *)msg->data); + error = ng_iface_ioctl(ifp, SIOCSIFMTU, (caddr_t) &ifr); + break; + } + default: error = EINVAL; break; Index: sys/netgraph/ng_iface.h === --- sys/netgraph/ng_iface.h (revision 338702) +++ sys/netgraph/ng_iface.h (arbetskopia) @@ -68,6 +68,7 @@ NGM_IFACE_POINT2POINT, NGM_IFACE_BROADCAST, NGM_IFACE_GET_IFINDEX, + NGM_IFACE_SET_MTU, }; #endif /* _NETGRAPH_NG_IFACE_H_ */ signature.asc Description: OpenPGP digital signature
Re: Patching ng_iface to allow setting the MTU via netgraph API
On 2018-10-10 22:33, Eugene Grosbein wrote: > However, this patch does not seem quite right to me. It may serve your needs > but it is incomplete in general case. You see, change in MTU should affect > not only > interface itself, it should also alter routing table and "interface link" > routes > that also have MTU attribute that is used for handling outgoing IP packets > utilizing such routes. > I am no expert on FreeBSD internals. I was wondering about this exact aspect and was hoping to get feedback just like this. > Why do you want to replicate this at NETGRAPH level? > ng_iface(4) was created to be generic network interface to NOT duplicate such > things. > After reading this, I went back and read the manual to ifconfig again. My wish is based on a misconception. I thought that the point-to-point vs. broadcast setting was general for all interfaces and could be readily set by using ifconfig ptp. As there was already duplication on that setting, I didn't see any real issue with setting the MTU via netgraph as well. Realising this is not the case has made me see that my idea isn't a great one. At least I know how to submit a patch in the future if I get actually well thought out ideas. Thank you for the feedback! // Andreas Kempe ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Infiniband: Mellanox MT26418 in ethernet mode causes crash on shutdown
Hello, When running a Mellanox MT26418 in ethernet mode, the kernel crashes with the following stack trace on system shutdown: > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0x0 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0x80e3f5f4 > stack pointer = 0x28:0xfe064abec6e0 > frame pointer = 0x28:0xfe064abec700 > code segment= base 0x0, limit 0xf, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags= interrupt enabled, resume, IOPL = 0 > current process = 1 (init) > trap number = 12 > panic: page fault > cpuid = 0 > KDB: stack backtrace: > #0 0x80b4c5b7 at kdb_backtrace+0x67 > #1 0x80b05b57 at vpanic+0x177 > #2 0x80b059d3 at panic+0x43 > #3 0x8106efdf at trap_fatal+0x35f > #4 0x8106f039 at trap_pfault+0x49 > #5 0x8106e807 at trap+0x2c7 > #6 0x8104f03c at calltrap+0x8 > #7 0x80e3fae2 at mlx4_en_stop_port+0x3d2 > #8 0x80e40ff6 at mlx4_en_destroy_netdev+0x1e6 > #9 0x80e3e47d at mlx4_en_remove+0xcd > #10 0x80e1ab01 at mlx4_remove_device+0xb1 > #11 0x80e1b0b8 at mlx4_unregister_device+0x98 > #12 0x80e1c5c5 at mlx4_unload_one+0x85 > #13 0x80e23543 at mlx4_shutdown+0x83 > #14 0x80d6b6e9 at linux_pci_shutdown+0x39 > #15 0x80b4004a at bus_generic_shutdown+0x5a > #16 0x80b4004a at bus_generic_shutdown+0x5a > #17 0x80b4004a at bus_generic_shutdown+0x5a I've traced the issue to the following lines of code in sys/dev/mlx4/mlx4_en/mlx4_en_netdev.c in mlx4_en_destroy_netdev(): > /* Unregister device - this will close the port if it was up */ > if (priv->registered) { > mutex_lock(&mdev->state_lock); > ether_ifdetach(dev); > mutex_unlock(&mdev->state_lock); >}>> mutex_lock(&mdev->state_lock); > mlx4_en_stop_port(dev); > mutex_unlock(&mdev->state_lock); > The issue is that mlx4_en_stop_port() follows the fcall chain below and tries to fetch the MAC address of the device in mlx4_en_put_qp. mlx4_en_destroy_netdev->mlx4_en_stop_port->mlx4_en_put_qp The sequence above causes the kernel to choke because the MAC address was freed in the previous call to ether_ifdetach in if_detach_internal with the following call chain: mlx4_en_destroy_netdev->ether_ifdetach->if_detach->if_detach_internal I've written a small workaround that works on our test machine, although I suspect this could potentially cause issues as we're destroying the port before we destroy the interface. Please see the attached patch for the workaround. Cordially, Andreas Kempe Lysator ACS --- sys/dev/mlx4/mlx4_en/mlx4_en_netdev.c.old 2019-02-24 01:01:54.759307000 +0100 +++ sys/dev/mlx4/mlx4_en/mlx4_en_netdev.c 2019-02-24 01:04:07.872558000 +0100 @@ -1764,16 +1764,19 @@ if (priv->vlan_detach != NULL) EVENTHANDLER_DEREGISTER(vlan_unconfig, priv->vlan_detach); + /* Bring the interface down before destroying the port. */ + if_down(dev); + + mutex_lock(&mdev->state_lock); + mlx4_en_stop_port(dev); + mutex_unlock(&mdev->state_lock); + /* Unregister device - this will close the port if it was up */ if (priv->registered) { mutex_lock(&mdev->state_lock); ether_ifdetach(dev); mutex_unlock(&mdev->state_lock); } - - mutex_lock(&mdev->state_lock); - mlx4_en_stop_port(dev); - mutex_unlock(&mdev->state_lock); if (priv->allocated) mlx4_free_hwq_res(mdev->dev, &priv->res, MLX4_EN_PAGE_SIZE); ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Infiniband: Mellanox MT26418 in ethernet mode causes crash on shutdown
On 2019-02-25 10:28, Hans Petter Selasky wrote: > I think the if_down() call is not strictly needed. ether_ifdetach() > already does this. Can you test the patch w/o the if_down() call? > I only added the call because I was not sure what would happen if you destroy the port for an active interface. If it could cause issues if operations are performed on the interface while it is being destroyed. That said, I've tested without the call to if_down() and it seems to work. Cordially, Andreas Kempe ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Infiniband: IPv6 neighbour discovery issues using Mellanox MT26418 between Linux and FreeBSD
Hello! We have been trying to use IPv6 together with Mellanox MT26418 and are having issues with neighbour discovery responses not getting back to the requesting machine when connecting a Linux machine to a FreeBSD machine. The request is visible in tcpdump on both machines. On the responding machine, the outgoing response message is visible, but on the requesting machine, the response is not visible. It didn't matter whether the Linux box or the FreeBSD box was the requester. We applied the attached patch to ndp (since ndp is currently lacking support for link layer addresses larger than 6 bytes) to be able to set static neighbours and then the traffic got through. When running traffic between two FreeBSD hosts, it worked without any manual intervention. The Linux machine is running Gentoo with Linux 4.14.83 and was acting as the OpenSM master, while the FreeBSD machines are running 11.2-RELEASE-p9. Does anyone recognise these issues? Thank you for any assistance! Cordially, Andreas Kempe --- usr.sbin/ndp/ndp.c.old 2019-03-13 22:06:05.472614000 +0100 +++ usr.sbin/ndp/ndp.c 2019-03-14 01:10:30.049934000 +0100 @@ -388,7 +388,7 @@ register struct sockaddr_dl *sdl; register struct rt_msghdr *rtm = &(m_rtmsg.m_rtm); struct addrinfo hints, *res; - int gai_error; + int gai_error, l; u_char *ea; char *host = argv[0], *eaddr = argv[1]; @@ -410,8 +410,9 @@ sin->sin6_scope_id = ((struct sockaddr_in6 *)res->ai_addr)->sin6_scope_id; ea = (u_char *)LLADDR(&sdl_m); - if (ndp_ether_aton(eaddr, ea) == 0) - sdl_m.sdl_alen = 6; + l = ndp_ether_aton(eaddr, ea); + if (l != -1) + sdl_m.sdl_alen = l; flags = expire_time = 0; while (argc-- > 0) { if (strncmp(argv[0], "temp", 4) == 0) { @@ -804,17 +805,33 @@ static int ndp_ether_aton(char *a, u_char *n) { - int i, o[6]; + int i, l, o[20]; + char buf[60]; + char *p; - i = sscanf(a, "%x:%x:%x:%x:%x:%x", &o[0], &o[1], &o[2], - &o[3], &o[4], &o[5]); - if (i != 6) { - fprintf(stderr, "ndp: invalid Ethernet address '%s'\n", a); - return (1); + l = 0; + p = strncpy(buf, a, sizeof(buf)); + if (p < (buf + sizeof(buf))) { + for (p = strtok(buf, ":"); p; l++, p = strtok(NULL, ":")) { + if (l > 19) { +/* l = 0 to indicate an error */ +l = 0; +break; + } + + if (sscanf(p, "%x", &o[l]) != 1) { +break; + } + } } - for (i = 0; i < 6; i++) + if (l > 20 || (l != 6 && l != 20)) { + fprintf(stderr, "ndp: invalid Ethernet address '%s'\n", + a); + return (-1); + } + for (i = 0; i < l; i++) n[i] = o[i]; - return (0); + return (l); } static void ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[PATCH] ipoib: Patch for crash in icmp_error, fault trap 12
Hello everyone, We have been using IP over IB in connected mode between a Linux machine running Void Linux and another machine running FreeBSD 12.1 STABLE. After having initially transferred data at expected speeds, about 5 Gbit/s, and letting the computers rest for a while the FreeBSD machine throws transmission timeout errors. When a new data transfer is started, the machine would complain that it cannot send a few packets because of them being too large. After this the kernel would panic. See example logs below: Timing out: > ib0: timing out; 7 sends not completed When starting new transfers: > ib0: packet len 32812 (> 2044) too long to send, dropping > ib0: packet len 8248 (> 2044) too long to send, dropping Kernel crash: > Fatal trap 12: page fault while in kernel mode > cpuid = 3; apic id = 03 > fault virtual address = 0x28 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0x80d76edf > stack pointer = 0x28:0xfe008edbeb50 > frame pointer = 0x28:0xfe008edbebb0 > code segment= base 0x0, limit 0xf, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags= interrupt enabled, resume, IOPL = 0 > current process = 0 (ipoib) > trap number = 12 > panic: page fault > cpuid = 3 > time = 1578710936 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe008edbe7b0 > vpanic() at vpanic+0x17e/frame 0xfe008edbe810 > panic() at panic+0x43/frame 0xfe008edbe870 > trap_pfault() at trap_pfault/frame 0xfe008edbe8e0 > trap_pfault() at trap_pfault+0x4f/frame 0xfe008edbe950 > trap() at trap+0x288/frame 0xfe008edbea80 > calltrap() at calltrap+0x8/frame 0xfe008edbea80 > --- trap 0xc, rip = 0x80d76edf, rsp = 0xfe008edbeb50, rbp = > 0xfe008edbebb0 --- > icmp_error() at icmp_error+0x2f/frame 0xfe008edbebb0 > ipoib_cm_mb_reap() at ipoib_cm_mb_reap+0x154/frame 0xfe008edbec00 > linux_work_fn() at linux_work_fn+0xfc/frame 0xfe008edbec60 > taskqueue_run_locked() at taskqueue_run_locked+0x144/frame 0xfe008edbecc0 > taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfe008edbecf0 > fork_exit() at fork_exit+0x7e/frame 0xfe008edbed30 > fork_trampoline() at fork_trampoline+0xe/frame 0xfe008edbed30 > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- > KDB: enter: panic The 0x28 access that causes the trap is caused by the error statistics if statement at the top of icmp_error in sys/netinet/ip_icmp.c: > if (type != ICMP_REDIRECT) > ICMPSTAT_INC(icps_error); ICMPSTAT_INC needs the VIMAGE for the current thread to be set. Its calling function, i.e. ipoib_cm_mb_reap in sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_cm.c, is scheduled in its own thread when the MTU size is too large in ipoib_cm_send. It then calls ipoib_cm_mb_too_long, which in turn schedules ipoib_cm_mb_reap (both functions are located in sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_cm.c). The attached patch fixes the issue by setting the VIMAGE for the thread in ipoib_cm_mb_reap. We still have not investigated what causes the MTU to be perceived as too large, but our machine stopped crashing after applying the patch. Cordially, Andreas Kempe Index: sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_cm.c === --- sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_cm.c (revision 356611) +++ sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_cm.c (working copy) @@ -1265,6 +1265,8 @@ spin_lock_irqsave(&priv->lock, flags); + CURVNET_SET_QUIET(priv->dev->if_vnet); + for (;;) { IF_DEQUEUE(&priv->cm.mb_queue, mb); if (mb == NULL) @@ -1291,6 +1293,8 @@ spin_lock_irqsave(&priv->lock, flags); } + CURVNET_RESTORE(); + spin_unlock_irqrestore(&priv->lock, flags); } signature.asc Description: PGP signature
[PATCH]: ipoib with mlx4 initialisation ordering
Hello everyone, We have had issues with our machine using IPoIB on FreeBSD with the mlx4 driver. The machine would hang on shutdown. We traced the issue to IPoIB registering multicast groups that increase the reference count of the port in the ib_multicast client. When shutting down the machine, the kernel tore down the ib_multicast before it tore down IPoIB, causing it to wait forever for the references to disappear before it deleted the multicast client. This issue can be remedied by changing the initialisation of the IPoIB module to happen after the mlx4 driver is initialised. By doing this, all multicast groups will be cleaned up before the ib_multicast client is destroyed. See patch attached. Sponsored by: Lysator ACS Cordially, Andreas Kempe --- sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_main.c 2020-02-21 20:52:35.311328000 +0100 +++ sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_main.c 2020-02-22 01:06:20.720997000 +0100 @@ -1754,7 +1754,7 @@ } } -module_init(ipoib_init_module); +module_init_order(ipoib_init_module, SI_ORDER_FOURTH); module_exit(ipoib_cleanup_module); static int signature.asc Description: PGP signature