On 20.03.2014, at 14:51, woll...@bimajority.org wrote: > In article <21290.60558.750106.630...@hergotha.csail.mit.edu>, I wrote: > >> Since we put this server into production, random network system calls >> have started failing with [EFBIG] or maybe sometimes [EIO]. I've >> observed this with a simple ping, but various daemons also log the >> errors: >> Mar 20 09:22:04 nfs-prod-4 sshd[42487]: fatal: Write failed: File too >> large [preauth] >> Mar 20 09:23:44 nfs-prod-4 nrpe[42492]: Error: Could not complete SSL >> handshake. 5 > > I found at least one call stack where this happens and it does get > returned all the way to userspace: > > 17 15547 _bus_dmamap_load_buffer:return > kernel`_bus_dmamap_load_mbuf_sg+0x5f > kernel`bus_dmamap_load_mbuf_sg+0x38 > kernel`ixgbe_xmit+0xcf > kernel`ixgbe_mq_start_locked+0x94 > kernel`ixgbe_mq_start+0x12a > if_lagg.ko`lagg_transmit+0xc4 > kernel`ether_output_frame+0x33 > kernel`ether_output+0x4fe > kernel`ip_output+0xd74 > kernel`tcp_output+0xfea > kernel`tcp_usr_send+0x325 > kernel`sosend_generic+0x3f6 > kernel`soo_write+0x5e > kernel`dofilewrite+0x85 > kernel`kern_writev+0x6c > kernel`sys_write+0x64 > kernel`amd64_syscall+0x5ea > kernel`0xffffffff808443c7
This looks pretty similar to what we’ve seen when we got EFBIG: 3 28502 _bus_dmamap_load_buffer:return kernel`_bus_dmamap_load_mbuf_sg+0x5f kernel`bus_dmamap_load_mbuf_sg+0x38 kernel`ixgbe_xmit+0xcf kernel`ixgbe_mq_start_locked+0x94 kernel`ixgbe_mq_start+0x12a kernel`ether_output_frame+0x33 kernel`ether_output+0x4fe kernel`ip_output+0xd74 kernel`rip_output+0x229 kernel`sosend_generic+0x3f6 kernel`kern_sendit+0x1a3 kernel`sendit+0xdc kernel`sys_sendto+0x4d kernel`amd64_syscall+0x5ea kernel`0xffffffff80d35667 In our case it looks like some of the ixgbe tx queues get stuck, and some don’t. You can test, wether your server shows the same symptoms with this command: # for CPU in {0..7}; do echo "CPU${CPU}"; cpuset -l ${CPU} ping -i 0.5 -c 2 -W 1 10.0.0.1 | grep sendto; done We also use 82599EB based ixgbe controllers on affected systems. Also see these two threads on freebsd-net: http://lists.freebsd.org/pipermail/freebsd-net/2014-February/037967.html http://lists.freebsd.org/pipermail/freebsd-net/2014-March/038061.html I have started the second one, and there are some more details of what we were seeing in case you’re interested. Then there is: http://www.freebsd.org/cgi/query-pr.cgi?pr=183390 and: https://bugs.freenas.org/issues/4560 Markus _______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"