Re: 9.2 ixgbe tx queue hang
On 25.03.2014, at 02:18, Rick Macklem wrote: > Christopher Forgeron wrote: >> >> >> >> This is regarding the TSO patch that Rick suggested earlier. (With >> many thanks for his time and suggestion) >> >> >> As I mentioned earlier, it did not fix the issue on a 10.0 system. It >> did make it less of a problem on 9.2, but either way, I think it's >> not needed, and shouldn't be considered as a patch for testing/etc. >> >> >> Patching TSO to anything other than a max value (and by default the >> code gives it IP_MAXPACKET) is confusing the matter, as the packet >> length ultimately needs to be adjusted for many things on the fly >> like TCP Options, etc. Using static header sizes won't be a good >> idea. >> > If you look at tcp_output(), you'll notice that it doesn't do TSO if > there are any options. That way it knows that the TCP/IP header is > just hdrlen. > > If you don't limit the TSO packet (including TCP/IP and ethernet headers) > to 64K, then the "ix" driver can't send them, which is the problem > you guys are seeing. > > There are other ways to fix this problem, but they all may introduce > issues that reducing if_hw_tsomax by a small amount does not. > For example, m_defrag() could be modified to use 4K pagesize clusters, > but this might introduce memory fragmentation problems. (I observed > what I think are memory fragmentation problems when I switched NFS > to use 4K pagesize clusters for large I/O messages.) > > If setting IP_MAXPACKET to 65518 fixes the problem (no more EFBIG > error replies), then that is the size that if_hw_tsomax can be set > to (just can't change IP_MAXPACKET, but that is defined for other > things). (It just happens that IP_MAXPACKET is what if_hw_tsomax > defaults to. It has no other effect w.r.t. TSO.) > >> >> Additionally, it seems that setting nic TSO will/may be ignored by >> code like this in sys/netinet/tcp_output.c: >> Is this confirmed or still a ‘it seems’? Have you actually seen a tp->t_tsomax value in tcp_output() bigger than if_hw_tsomax or was this just speculation because the values are stored in different places? (Sorry, if you already stated this in another email, it’s currently hard to keep track of all the information.) Anyway, this dtrace one-liner should be a good test if other values appear in tp->t_tsomax: # dtrace -n 'fbt::tcp_output:entry / args[0]->t_tsomax != 0 && args[0]->t_tsomax != 65518 / { printf("unexpected tp->t_tsomax: %i\n", args[0]->t_tsomax); stack(); }' Remember to adjust the value in the condition to whatever you’re currently expecting. The value seems to be 0 for new connections, probably when tcp_mss() has not been called yet. So that’s seems normal and I have excluded that case too. This will also print a kernel stack trace in case it sees an unexpected value. > Yes, but I don't know why. > The only conjecture I can come up with is that another net driver is > stacked above "ix" and the setting for if_hw_tsomax doesn't propagate > up. (If you look at the commit log message for r251296, the intent > of adding if_hw_tsomax was to allow device drivers to set a smaller > tsomax than IP_MAXPACKET.) > > Are you using any of the "stacked" network device drivers like > lagg? I don't even know what the others all are? > Maybe someone else can list them? I guess the most obvious are lagg and vlan (and probably carp on FreeBSD 9.x or older). On request from Jack, we’ve eliminated lagg and vlan from the picture, which gives us plain ixgbe interfaces with no stacked interfaces on top of it. And we can still reproduce the problem. Markus > > rick >> >> 10.0 Code: >> >> 780 if (len > tp->t_tsomax - hdrlen) { !! >> 781 len = tp->t_tsomax - hdrlen; !! >> 782 sendalot = 1; >> 783 } >> >> >> >> >> I've put debugging here, set the nic's max TSO as per Rick's patch ( >> set to say 32k), and have seen that tp->t_tsomax == IP_MAXPACKET. >> It's being set someplace else, and thus our attempts to set TSO on >> the nic may be in vain. >> >> >> It may have mattered more in 9.2, as I see the code doesn't use >> tp->t_tsomax in some locations, and may actually default to what the >> nic is set to. >> >> The NIC may still win, I didn't walk through the code to confirm, it >> was enough to suggest to me that setting TSO wouldn't fix this >> issue. >> >> >> However, this is still a TSO related issue, it's just not one related >> to the setting of TSO's max size. >> >> A 10.0-STABLE system with tso disabled on ix0 doesn't have a single >> packet over IP_MAXPACKET in 1 hour of runtime. I'll let it go a bit >> longer to increase confidence in this assertion, but I don't want to >> waste time on this when I could be logging problem packets on a >> system with TSO enabled. >> >> >> Comments are very welcome.. >> >> >> > ___ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-ne
Re: 9.2 ixgbe tx queue hang
Hey guys, I have nothing on your code level to add, but.. while investigating this issue I ran into the guy that originally created the bug ( http://www.freebsd.org/cgi/query-pr.cgi?pr=183390&cat=). In the email exchange that followed he told me that had found a workaround by running a specific -STABLE revision: "Yes, we found a workaround. We upgraded to the -STABLE branch of the 9.2, so we use this currently: [root@storagex ~]# uname -a FreeBSD storagex.lan.granaglia.com 9.2-STABLE FreeBSD 9.2-STABLE #0 r257712: Tue Nov 5 23:02:49 CET 2013 r...@storagex.lan.granaglia.com:/usr/obj/usr/src/sys/GENERIC amd64" Maybe this could help you in your quest to hunt this bug down. On Tue, Mar 25, 2014 at 1:16 PM, Markus Gebert wrote: > > On 25.03.2014, at 02:18, Rick Macklem wrote: > > > Christopher Forgeron wrote: > >> > >> > >> > >> This is regarding the TSO patch that Rick suggested earlier. (With > >> many thanks for his time and suggestion) > >> > >> > >> As I mentioned earlier, it did not fix the issue on a 10.0 system. It > >> did make it less of a problem on 9.2, but either way, I think it's > >> not needed, and shouldn't be considered as a patch for testing/etc. > >> > >> > >> Patching TSO to anything other than a max value (and by default the > >> code gives it IP_MAXPACKET) is confusing the matter, as the packet > >> length ultimately needs to be adjusted for many things on the fly > >> like TCP Options, etc. Using static header sizes won't be a good > >> idea. > >> > > If you look at tcp_output(), you'll notice that it doesn't do TSO if > > there are any options. That way it knows that the TCP/IP header is > > just hdrlen. > > > > If you don't limit the TSO packet (including TCP/IP and ethernet headers) > > to 64K, then the "ix" driver can't send them, which is the problem > > you guys are seeing. > > > > There are other ways to fix this problem, but they all may introduce > > issues that reducing if_hw_tsomax by a small amount does not. > > For example, m_defrag() could be modified to use 4K pagesize clusters, > > but this might introduce memory fragmentation problems. (I observed > > what I think are memory fragmentation problems when I switched NFS > > to use 4K pagesize clusters for large I/O messages.) > > > > If setting IP_MAXPACKET to 65518 fixes the problem (no more EFBIG > > error replies), then that is the size that if_hw_tsomax can be set > > to (just can't change IP_MAXPACKET, but that is defined for other > > things). (It just happens that IP_MAXPACKET is what if_hw_tsomax > > defaults to. It has no other effect w.r.t. TSO.) > > > >> > >> Additionally, it seems that setting nic TSO will/may be ignored by > >> code like this in sys/netinet/tcp_output.c: > >> > > Is this confirmed or still a 'it seems'? Have you actually seen a > tp->t_tsomax value in tcp_output() bigger than if_hw_tsomax or was this > just speculation because the values are stored in different places? (Sorry, > if you already stated this in another email, it's currently hard to keep > track of all the information.) > > Anyway, this dtrace one-liner should be a good test if other values appear > in tp->t_tsomax: > > # dtrace -n 'fbt::tcp_output:entry / args[0]->t_tsomax != 0 && > args[0]->t_tsomax != 65518 / { printf("unexpected tp->t_tsomax: %i\n", > args[0]->t_tsomax); stack(); }' > > Remember to adjust the value in the condition to whatever you're currently > expecting. The value seems to be 0 for new connections, probably when > tcp_mss() has not been called yet. So that's seems normal and I have > excluded that case too. This will also print a kernel stack trace in case > it sees an unexpected value. > > > > Yes, but I don't know why. > > The only conjecture I can come up with is that another net driver is > > stacked above "ix" and the setting for if_hw_tsomax doesn't propagate > > up. (If you look at the commit log message for r251296, the intent > > of adding if_hw_tsomax was to allow device drivers to set a smaller > > tsomax than IP_MAXPACKET.) > > > > Are you using any of the "stacked" network device drivers like > > lagg? I don't even know what the others all are? > > Maybe someone else can list them? > > I guess the most obvious are lagg and vlan (and probably carp on FreeBSD > 9.x or older). > > On request from Jack, we've eliminated lagg and vlan from the picture, > which gives us plain ixgbe interfaces with no stacked interfaces on top of > it. And we can still reproduce the problem. > > > Markus > > > > > > rick > >> > >> 10.0 Code: > >> > >> 780 if (len > tp->t_tsomax - hdrlen) { !! > >> 781 len = tp->t_tsomax - hdrlen; !! > >> 782 sendalot = 1; > >> 783 } > >> > >> > >> > >> > >> I've put debugging here, set the nic's max TSO as per Rick's patch ( > >> set to say 32k), and have seen that tp->t_tsomax == IP_MAXPACKET. > >> It's being set someplace else, and thus our attempts to set TSO on > >> the nic may be in vain. > >> > >> > >> It may have mattered more in 9.2, as I see the code doesn't use > >> t
Re: syslogd:sendto: no buffer available on 10-stable
Hi Simon, Try checking out the "9.2 ixgbe tx queue hang' thread here, and see if it applies to you. On Tue, Mar 25, 2014 at 1:55 AM, k simon wrote: > Hi,Lists: > I have got lots of "no buffer available" on 10-stable with igb nic. > But em and bce works well. And I tried force igb to 4 or 8 queues and > set hw.igb.rx_process_limit="-1", but have nothing helped. > > > Regards > Simon > > > > # uname -a > FreeBSD sq-l1-n2 10.0-STABLE FreeBSD 10.0-STABLE #0 r262469: Tue Feb 25 > 13:27:11 CST 2014 > root@sq-l1-n2:/usr/obj/usr/src/sys/stable-10-262458 amd64 > > > # netstat -mb > 19126/73289/92415 mbufs in use (current/cache/total) > 13289/46841/60130/524288 mbuf clusters in use (current/cache/total/max) > 13289/46812 mbuf+clusters out of packet secondary zone in use > (current/cache) > 5638/22605/28243/262144 4k (page size) jumbo clusters in use > (current/cache/total/max) > 0/0/0/77672 9k jumbo clusters in use (current/cache/total/max) > 0/0/0/43690 16k jumbo clusters in use (current/cache/total/max) > 53914K/202424K/256338K bytes allocated to network (current/cache/total) > 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 0 requests for I/O initiated by sendfile > > > # netstat -di > NameMtu Network Address Ipkts Ierrs Idrop > Opkts Oerrs Coll Drop > igb0 1500 00:1b:21:70:5f:80 17212101113 1355809 0 > 19612978862 0 0 > igb1 1500 00:1b:21:70:5f:81 76601294282 81162751 0 > 74236432686 0 0 > lo0 16384 20532742636 0 0 > 20522475797 0 0 > lo0 - your-net localhost 2736994243 - - > 20520227166 - - > > > # sysctl hw.igb > hw.igb.rxd: 2048 > hw.igb.txd: 2048 > hw.igb.enable_aim: 1 > hw.igb.enable_msix: 1 > hw.igb.max_interrupt_rate: 12000 > hw.igb.buf_ring_size: 4096 > hw.igb.header_split: 0 > hw.igb.num_queues: 1 > hw.igb.rx_process_limit: 1000 > > > # sysctl dev.igb.1 > dev.igb.1.%desc: Intel(R) PRO/1000 Network Connection version - 2.4.0 > dev.igb.1.%driver: igb > dev.igb.1.%location: slot=0 function=1 > dev.igb.1.%pnpinfo: vendor=0x8086 device=0x10c9 subvendor=0x8086 > subdevice=0xa04c class=0x02 > dev.igb.1.%parent: pci8 > dev.igb.1.nvm: -1 > dev.igb.1.enable_aim: 1 > dev.igb.1.fc: 0 > dev.igb.1.rx_processing_limit: 4096 > dev.igb.1.link_irq: 3 > dev.igb.1.dropped: 0 > dev.igb.1.tx_dma_fail: 0 > dev.igb.1.rx_overruns: 0 > dev.igb.1.watchdog_timeouts: 0 > dev.igb.1.device_control: 1086325313 > dev.igb.1.rx_control: 67141634 > dev.igb.1.interrupt_mask: 4 > dev.igb.1.extended_int_mask: 2147483651 > dev.igb.1.tx_buf_alloc: 0 > dev.igb.1.rx_buf_alloc: 0 > dev.igb.1.fc_high_water: 58976 > dev.igb.1.fc_low_water: 58960 > dev.igb.1.queue0.no_desc_avail: 10874 > dev.igb.1.queue0.tx_packets: 74509997338 > dev.igb.1.queue0.rx_packets: 76837720630 > dev.igb.1.queue0.rx_bytes: 35589607860237 > dev.igb.1.queue0.lro_queued: 0 > dev.igb.1.queue0.lro_flushed: 0 > dev.igb.1.mac_stats.excess_coll: 0 > dev.igb.1.mac_stats.single_coll: 0 > dev.igb.1.mac_stats.multiple_coll: 0 > dev.igb.1.mac_stats.late_coll: 0 > dev.igb.1.mac_stats.collision_count: 0 > dev.igb.1.mac_stats.symbol_errors: 0 > dev.igb.1.mac_stats.sequence_errors: 0 > dev.igb.1.mac_stats.defer_count: 0 > dev.igb.1.mac_stats.missed_packets: 81162751 > dev.igb.1.mac_stats.recv_no_buff: 176691324 > dev.igb.1.mac_stats.recv_undersize: 0 > dev.igb.1.mac_stats.recv_fragmented: 0 > dev.igb.1.mac_stats.recv_oversize: 0 > dev.igb.1.mac_stats.recv_jabber: 0 > dev.igb.1.mac_stats.recv_errs: 0 > dev.igb.1.mac_stats.crc_errs: 0 > dev.igb.1.mac_stats.alignment_errs: 0 > dev.igb.1.mac_stats.coll_ext_errs: 0 > dev.igb.1.mac_stats.xon_recvd: 0 > dev.igb.1.mac_stats.xon_txd: 0 > dev.igb.1.mac_stats.xoff_recvd: 0 > dev.igb.1.mac_stats.xoff_txd: 0 > dev.igb.1.mac_stats.total_pkts_recvd: 76925709917 > dev.igb.1.mac_stats.good_pkts_recvd: 76837704301 > dev.igb.1.mac_stats.bcast_pkts_recvd: 49174716 > dev.igb.1.mac_stats.mcast_pkts_recvd: 282670 > dev.igb.1.mac_stats.rx_frames_64: 31057121854 > dev.igb.1.mac_stats.rx_frames_65_127: 19996324498 > dev.igb.1.mac_stats.rx_frames_128_255: 1171960837 > dev.igb.1.mac_stats.rx_frames_256_511: 2295894674 > dev.igb.1.mac_stats.rx_frames_512_1023: 2026241811 > dev.igb.1.mac_stats.rx_frames_1024_1522: 20290160627 > dev.igb.1.mac_stats.good_octets_recvd: 36204302378783 > dev.igb.1.mac_stats.good_octets_txd: 59038220741656 > dev.igb.1.mac_stats.total_pkts_txd: 90973292365 > dev.igb.1.mac_stats.good_pkts_txd: 90973292359 > dev.igb.1.mac_stats.bcast_pkts_txd: 2408182 > dev.igb.1.mac_stats.mcast_pkts_txd: 246782 > dev.igb.1.mac_stats.tx_frames_64: 24604769631 > dev.igb.1.mac_stats.tx_frames_65_127: 21373976133 > dev.igb.1.mac_stats.tx_frames_128_255: 3047554
Re: 9.2 ixgbe tx queue hang
Hi guys, I'm in meetings today, so I'll respond to the other emails later. Just wanted to clarify about tp->t_tsomax : I can't make a solid assertion about it's value as I only tracked it briefly. I did see it being != if_hw_tsomax, but that was a short test and should really be checked more carefully. For now we should assume it's a possible, but not confirmed. However, setting if_hw_tsomax as low as 32k did not fix the problem for me. So either setting TSO is not the fix, or not everything is paying attention to if_hw_tsomax. It has to be one or the other. Setting IP_MAXPACKET does fix it for me, but of course that's not a solid fix. On Tue, Mar 25, 2014 at 9:16 AM, Markus Gebert wrote: > > On 25.03.2014, at 02:18, Rick Macklem wrote: > > > Christopher Forgeron wrote: > >> > >> > >> > >> This is regarding the TSO patch that Rick suggested earlier. (With > >> many thanks for his time and suggestion) > >> > >> > >> As I mentioned earlier, it did not fix the issue on a 10.0 system. It > >> did make it less of a problem on 9.2, but either way, I think it's > >> not needed, and shouldn't be considered as a patch for testing/etc. > >> > >> > >> Patching TSO to anything other than a max value (and by default the > >> code gives it IP_MAXPACKET) is confusing the matter, as the packet > >> length ultimately needs to be adjusted for many things on the fly > >> like TCP Options, etc. Using static header sizes won't be a good > >> idea. > >> > > If you look at tcp_output(), you'll notice that it doesn't do TSO if > > there are any options. That way it knows that the TCP/IP header is > > just hdrlen. > > > > If you don't limit the TSO packet (including TCP/IP and ethernet headers) > > to 64K, then the "ix" driver can't send them, which is the problem > > you guys are seeing. > > > > There are other ways to fix this problem, but they all may introduce > > issues that reducing if_hw_tsomax by a small amount does not. > > For example, m_defrag() could be modified to use 4K pagesize clusters, > > but this might introduce memory fragmentation problems. (I observed > > what I think are memory fragmentation problems when I switched NFS > > to use 4K pagesize clusters for large I/O messages.) > > > > If setting IP_MAXPACKET to 65518 fixes the problem (no more EFBIG > > error replies), then that is the size that if_hw_tsomax can be set > > to (just can't change IP_MAXPACKET, but that is defined for other > > things). (It just happens that IP_MAXPACKET is what if_hw_tsomax > > defaults to. It has no other effect w.r.t. TSO.) > > > >> > >> Additionally, it seems that setting nic TSO will/may be ignored by > >> code like this in sys/netinet/tcp_output.c: > >> > > Is this confirmed or still a 'it seems'? Have you actually seen a > tp->t_tsomax value in tcp_output() bigger than if_hw_tsomax or was this > just speculation because the values are stored in different places? (Sorry, > if you already stated this in another email, it's currently hard to keep > track of all the information.) > > Anyway, this dtrace one-liner should be a good test if other values appear > in tp->t_tsomax: > > # dtrace -n 'fbt::tcp_output:entry / args[0]->t_tsomax != 0 && > args[0]->t_tsomax != 65518 / { printf("unexpected tp->t_tsomax: %i\n", > args[0]->t_tsomax); stack(); }' > > Remember to adjust the value in the condition to whatever you're currently > expecting. The value seems to be 0 for new connections, probably when > tcp_mss() has not been called yet. So that's seems normal and I have > excluded that case too. This will also print a kernel stack trace in case > it sees an unexpected value. > > > > Yes, but I don't know why. > > The only conjecture I can come up with is that another net driver is > > stacked above "ix" and the setting for if_hw_tsomax doesn't propagate > > up. (If you look at the commit log message for r251296, the intent > > of adding if_hw_tsomax was to allow device drivers to set a smaller > > tsomax than IP_MAXPACKET.) > > > > Are you using any of the "stacked" network device drivers like > > lagg? I don't even know what the others all are? > > Maybe someone else can list them? > > I guess the most obvious are lagg and vlan (and probably carp on FreeBSD > 9.x or older). > > On request from Jack, we've eliminated lagg and vlan from the picture, > which gives us plain ixgbe interfaces with no stacked interfaces on top of > it. And we can still reproduce the problem. > > > Markus > > > > > > rick > >> > >> 10.0 Code: > >> > >> 780 if (len > tp->t_tsomax - hdrlen) { !! > >> 781 len = tp->t_tsomax - hdrlen; !! > >> 782 sendalot = 1; > >> 783 } > >> > >> > >> > >> > >> I've put debugging here, set the nic's max TSO as per Rick's patch ( > >> set to say 32k), and have seen that tp->t_tsomax == IP_MAXPACKET. > >> It's being set someplace else, and thus our attempts to set TSO on > >> the nic may be in vain. > >> > >> > >> It may have mattered more in 9.2, as I see the code doesn't use > >> tp->t_tsomax in some locations, a
Server sockets staying in CLOSED for extended periods
There has been a long thread on stable about sshd processes being hung due to sockets remaining in a CLOSED state for extended periods in 10-stable. This did not seem to be happening with 9.2. (Not sure about 10.0.) Was here a change in the network stack on 10 that would have kept CLOSED sockets around for extended intervals? Should sshd processes hang for extended periods (forever?) unless killed (KILL)? I also wonder if a daemon process should refuse to exit if a closed socket remains. Any suggestions would help. the full thread with a great deal of detail is in stable with the subject "sshd with zombie process on FreeBSD 10.0-STABLE - workaround". -- R. Kevin Oberman, Network Engineer, Retired E-mail: rkober...@gmail.com ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Non-interrupt packet sending and receiving
This isn't the same as the polled driver; this is sending and receiving a single packet at a time. I've gotten (at least to a somewhat workable degree) Apple's KDP ported to FreeBSD. I've only changed the dev/e1000/if_lem.c driver for now (that's the one VMWare shows up as :)), but since I'm not particularly comfortable with device drivers, let alone ethernet drivers, I needed some feedback. Diffs are attached below. Feedback would be appreciated. (To answer some of the questions I've already gotten: no, i can't use the DEVICE_POLLING routines, because that still goes through the entire stack. It's not here, because I am not yet happy with it, but the code that uses this runs in the kernel debugger, and it needs to be able to send and receive a single packet at a time -- and it can't let it be shuffled off through other layers, for reasons that I hope are fairly clear. Now, one change I would like to make is the mbuf allocation that it uses; ideally, honestly, it should have its own mbufs -- the protocol never sends more than 1538 bytes in a UDP packet -- but I would probably try working on another ethernet driver first, modeling it after this.) (The bulk of the diffs is moving some code out of lem_rxeof into a function that gets a single packet.) Thanks, Sean. diff --git a/dev/e1000/if_lem.c b/dev/e1000/if_lem.c index bfe2c93..90ec8b3 100644 --- a/dev/e1000/if_lem.c +++ b/dev/e1000/if_lem.c @@ -34,6 +34,7 @@ #include "opt_inet.h" #include "opt_inet6.h" +#include "opt_ddb.h" #ifdef HAVE_KERNEL_OPTION_HEADERS #include "opt_device_polling.h" @@ -54,6 +55,7 @@ #include #include #include +#include #include #include @@ -191,7 +193,7 @@ static void lem_free_transmit_structures(struct adapter *); static voidlem_free_receive_structures(struct adapter *); static voidlem_update_stats_counters(struct adapter *); static voidlem_add_hw_stats(struct adapter *adapter); -static voidlem_txeof(struct adapter *); +static voidlem_txeof(struct adapter *, int); static voidlem_tx_purge(struct adapter *); static int lem_allocate_receive_structures(struct adapter *); static int lem_allocate_transmit_structures(struct adapter *); @@ -246,6 +248,74 @@ static voidlem_handle_rxtx(void *context, int pending); static voidlem_handle_link(void *context, int pending); static voidlem_add_rx_process_limit(struct adapter *, const char *, const char *, int *, int); +#ifdef DDB +typedef uint32_t (*kdp_link_t)(void); +typedef int (*kdp_mode_t)(int); +extern void *kdp_get_interface(void); +typedef void (*kdp_send_t)(void * pkt, unsigned int pkt_len); +typedef void (*kdp_receive_t)(void * pkt, unsigned int * pkt_len, + unsigned int timeout); +extern void kdp_register_send_receive(kdp_send_t send, kdp_receive_t receive); +extern void kdp_unregister_send_receive(kdp_send_t send, kdp_receive_t receive); + +/* + * Function called by kdp. + * timeout is in milliseconds. + * The data is 1538 bytes. + * Return the length in *length; set it to 0 if no packet. + */ +static void lem_kdp_recv_pkt(void *data, unsigned int *length, unsigned int timeout); + +static void lem_kdp_send_pkt(void *pkt, unsigned int pkt_len); + +/* + * For kdp: + * lex_rxeof() may be usable. However, we'd have to + * change the ifp->if_input() function pointer to be + * something more conducive to our needs. + * + * For transmitting, we'd want to use lem_txeof, but + * we have to figure out how to put data into the + * adapter queue. Something like: + * + * struct mbchain *mbp; + * struct mbuf *m; + * mb_init(mbp); + * mb_put_uint8(mbp, 33); + * mb_put_uint16le(mbp, length); + * m = m_copym(mbp->mb_top, 0, M_COPYALL, M_WAIT); + * + * Then we have to get the mbuf chain (m) into the + * device's queue. + * + * Then: + * + * mb_done(mbp); + * + */ +/* + * Tell kdp about functions to query status. + * The link parameter is a pointer to a function + * which returns the status (it only cares about + * IFM_AVALID and IFM_ACTIVE). Note that it has no + * parameters -- so it has to use the kdp_get_interface() + * function to find out the current interface. This should + * probably change. + * + * Similarly, the mode parameter is a function which sets the + * status active (if its parameter is non-zero) or inactive (if + * its parameter is 0). + */ +void kdp_register_link(kdp_link_t link, kdp_mode_t mode); +void kdp_unregister_link(kdp_link_t link, kdp_mode_t mode); + +// This is a bit of a lie: it actually takes a pointer to a kdp_ether_addr_t structure. +void kdp_set_interface(void *interface, const void *macaddr); + +static uint32_t kdp_media_status(void); +static int kdp_set_media_state(int); + +#endif #ifdef DEVICE_POLLING static poll_handler_t lem_poll; @@ -835,7 +905,7 @@ lem_start_locked(struct ifnet *ifp) * available hits the threshold */ if (adapter->num_tx_desc_avail <= EM_TX_CLEANUP_THRESHOLD) {
Re: Non-interrupt packet sending and receiving
You might want to take a look at the projects/sv branch, which implement kernel core dumps over the network. We had to solve a similar problem there (in lem, em, igb and ixgbe) and ended up piggybacking on most of the DEVICE_POLLING code to do it. The work ended up stalling over objections over calling into the mbuf allocator (which I guess may be a stumbling block for your work). Unfortunately we weren't able to come up with a clean way to share the existing rx/tx paths in the drivers while separating the mbuf allocations out. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: 9.2 ixgbe tx queue hang
I'm quite positive that an IP_MAXPACKET = 65518 would fix this, as I've never seen a packet overshoot by more than 11 bytes, although that's just in my case. It's next up on my test list. BTW, to answer the next message: I am expierencing the error with a raw ix or lagg interface. Originally I was on lagg, but have dropped down to a single ix for testing. Thanks for your continued help. On Mon, Mar 24, 2014 at 10:04 PM, Rick Macklem wrote: > Markus Gebert wrote: > > > > On 24.03.2014, at 16:21, Christopher Forgeron > > wrote: > > > > > This is regarding the TSO patch that Rick suggested earlier. (With > > > many > > > thanks for his time and suggestion) > > > > > > As I mentioned earlier, it did not fix the issue on a 10.0 system. > > > It did > > > make it less of a problem on 9.2, but either way, I think it's not > > > needed, > > > and shouldn't be considered as a patch for testing/etc. > > > > > > Patching TSO to anything other than a max value (and by default the > > > code > > > gives it IP_MAXPACKET) is confusing the matter, as the packet > > > length > > > ultimately needs to be adjusted for many things on the fly like TCP > > > Options, etc. Using static header sizes won't be a good idea. > > > > > > Additionally, it seems that setting nic TSO will/may be ignored by > > > code > > > like this in sys/netinet/tcp_output.c: > > > > > > 10.0 Code: > > > > > > 780 if (len > tp->t_tsomax - hdrlen) > > > { !! > > > 781 len = tp->t_tsomax - > > > hdrlen; !! > > > 782 sendalot = > > > 1; > > > 783 } > > > > > > > > > I've put debugging here, set the nic's max TSO as per Rick's patch > > > ( set to > > > say 32k), and have seen that tp->t_tsomax == IP_MAXPACKET. It's > > > being set > > > someplace else, and thus our attempts to set TSO on the nic may be > > > in vain. > > > > > > It may have mattered more in 9.2, as I see the code doesn't use > > > tp->t_tsomax in some locations, and may actually default to what > > > the nic is > > > set to. > > > > > > The NIC may still win, I didn't walk through the code to confirm, > > > it was > > > enough to suggest to me that setting TSO wouldn't fix this issue. > > > > > > I just applied Rick's ixgbe TSO patch and additionally wanted to be > > able to easily change the value of hw_tsomax, so I made a sysctl out > > of it. > > > > While doing that, I asked myself the same question. Where and how > > will this value actually be used and how comes that tcp_output() > > uses that other value in struct tcpcb. > > > > The only place tcpcb->t_tsomax gets set, that I have found so far, is > > in tcp_input.c's tcp_mss() function. Some subfunctions get called: > > > > tcp_mss() -> tcp_mss_update() -> tcp_maxmtu() > > > > Then tcp_maxmtu() indeed uses the interface's hw_tsomax value: > > > > 1746 cap->tsomax = ifp->if_hw_tsomax; > > > > It get's passed back to tcp_mss() where it is set on the connection > > level which will be used in tcp_output() later on. > > > > tcp_mss() gets called from multiple places, I'll look into that > > later. I will let you know if I find out more. > > > > > > Markus > > > Well, if tp->t_tsomax isn't set to a value of 65518, then the ixgbe.patch > isn't doing what I thought it would. > > The only explanation I can think of for this is that there might be > another net interface driver stacked on top of the ixgbe.c one and > that the setting doesn't get propagated up. > Does this make any sense? > > IP_MAXPACKET can't be changed from 65535, but I can see an argument > for setting the default value of if_hw_tsomax to a smaller value. > For example, in sys/net/if.c change it from: > 657 if (ifp->if_hw_tsomax == 0) > 658 ifp->if_hw_tsomax = IP_MAXPACKET; > to > 657 if (ifp->if_hw_tsomax == 0) > 658 ifp->if_hw_tsomax = 65536 - (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN); > > This is a slightly smaller default which won't have much impact unless > the hardware device can only handle 32 mbuf clusters for transmit of > a segment and there are several of those. > > Christopher, can you do your test run with IP_MAXPACKET set to 65518, > which should be the same as the above. If that gets rid of all the > EFBIG error replies, then I think the above patch will have the same > effect. > > Thanks, rick > > > > > > However, this is still a TSO related issue, it's just not one > > > related to > > > the setting of TSO's max size. > > > > > > A 10.0-STABLE system with tso disabled on ix0 doesn't have a single > > > packet > > > over IP_MAXPACKET in 1 hour of runtime. I'll let it go a bit longer > > > to > > > increase confidence in this assertion, but I don't want to waste > > > time on > > > this when I could be logging problem packets on a system with TSO > > > enabled. > > > > > > Comments are very welcome.. > > > ___ > > > free
Re: Non-interrupt packet sending and receiving
On Mar 25, 2014, at 12:15 PM, Ryan Stone wrote: > You might want to take a look at the projects/sv branch, which > implement kernel core dumps over the network. We had to solve a > similar problem there (in lem, em, igb and ixgbe) and ended up > piggybacking on most of the DEVICE_POLLING code to do it. The work > ended up stalling over objections over calling into the mbuf allocator > (which I guess may be a stumbling block for your work). Unfortunately > we weren't able to come up with a clean way to share the existing > rx/tx paths in the drivers while separating the mbuf allocations out. I checked the XNU drivers I could find online, and it turns out that they did mbuf allocations. Since I was using that as my model, I went with it for now. I was aware of the network core dump work, but not until I'd started to make some progress with this. (The kdp protocol theoretically has support for dumping a core over the network, but I haven't implemented that at all.) Looking at if_lem.c, it appears similar to a large degree. Perhaps I should start from scratch... Sean. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: 9.2 ixgbe tx queue hang
Markus Gebert wrote: > > On 25.03.2014, at 02:18, Rick Macklem wrote: > > > Christopher Forgeron wrote: > >> > >> > >> > >> This is regarding the TSO patch that Rick suggested earlier. (With > >> many thanks for his time and suggestion) > >> > >> > >> As I mentioned earlier, it did not fix the issue on a 10.0 system. > >> It > >> did make it less of a problem on 9.2, but either way, I think it's > >> not needed, and shouldn't be considered as a patch for > >> testing/etc. > >> > >> > >> Patching TSO to anything other than a max value (and by default > >> the > >> code gives it IP_MAXPACKET) is confusing the matter, as the packet > >> length ultimately needs to be adjusted for many things on the fly > >> like TCP Options, etc. Using static header sizes won't be a good > >> idea. > >> > > If you look at tcp_output(), you'll notice that it doesn't do TSO > > if > > there are any options. That way it knows that the TCP/IP header is > > just hdrlen. > > > > If you don't limit the TSO packet (including TCP/IP and ethernet > > headers) > > to 64K, then the "ix" driver can't send them, which is the problem > > you guys are seeing. > > > > There are other ways to fix this problem, but they all may > > introduce > > issues that reducing if_hw_tsomax by a small amount does not. > > For example, m_defrag() could be modified to use 4K pagesize > > clusters, > > but this might introduce memory fragmentation problems. (I observed > > what I think are memory fragmentation problems when I switched NFS > > to use 4K pagesize clusters for large I/O messages.) > > > > If setting IP_MAXPACKET to 65518 fixes the problem (no more EFBIG > > error replies), then that is the size that if_hw_tsomax can be set > > to (just can't change IP_MAXPACKET, but that is defined for other > > things). (It just happens that IP_MAXPACKET is what if_hw_tsomax > > defaults to. It has no other effect w.r.t. TSO.) > > > >> > >> Additionally, it seems that setting nic TSO will/may be ignored by > >> code like this in sys/netinet/tcp_output.c: > >> > > Is this confirmed or still a ‘it seems’? Have you actually seen a > tp->t_tsomax value in tcp_output() bigger than if_hw_tsomax or was > this just speculation because the values are stored in different > places? (Sorry, if you already stated this in another email, it’s > currently hard to keep track of all the information.) > > Anyway, this dtrace one-liner should be a good test if other values > appear in tp->t_tsomax: > > # dtrace -n 'fbt::tcp_output:entry / args[0]->t_tsomax != 0 && > args[0]->t_tsomax != 65518 / { printf("unexpected tp->t_tsomax: > %i\n", args[0]->t_tsomax); stack(); }' > > Remember to adjust the value in the condition to whatever you’re > currently expecting. The value seems to be 0 for new connections, > probably when tcp_mss() has not been called yet. So that’s seems > normal and I have excluded that case too. This will also print a > kernel stack trace in case it sees an unexpected value. > > > > Yes, but I don't know why. > > The only conjecture I can come up with is that another net driver > > is > > stacked above "ix" and the setting for if_hw_tsomax doesn't > > propagate > > up. (If you look at the commit log message for r251296, the intent > > of adding if_hw_tsomax was to allow device drivers to set a smaller > > tsomax than IP_MAXPACKET.) > > > > Are you using any of the "stacked" network device drivers like > > lagg? I don't even know what the others all are? > > Maybe someone else can list them? > > I guess the most obvious are lagg and vlan (and probably carp on > FreeBSD 9.x or older). > > On request from Jack, we’ve eliminated lagg and vlan from the > picture, which gives us plain ixgbe interfaces with no stacked > interfaces on top of it. And we can still reproduce the problem. > This was related to the "did if_hw_tsomax set tp->t_tsomax to the same value?" question. Since you reported that my patch that set if_hw_tsomax in the driver didn't fix the problem, that suggests that tp->t_tsomax isn't being set to if_hw_tsomax from the driver, but we don't know why? rick > > Markus > > > > > > rick > >> > >> 10.0 Code: > >> > >> 780 if (len > tp->t_tsomax - hdrlen) { !! > >> 781 len = tp->t_tsomax - hdrlen; !! > >> 782 sendalot = 1; > >> 783 } > >> > >> > >> > >> > >> I've put debugging here, set the nic's max TSO as per Rick's patch > >> ( > >> set to say 32k), and have seen that tp->t_tsomax == IP_MAXPACKET. > >> It's being set someplace else, and thus our attempts to set TSO on > >> the nic may be in vain. > >> > >> > >> It may have mattered more in 9.2, as I see the code doesn't use > >> tp->t_tsomax in some locations, and may actually default to what > >> the > >> nic is set to. > >> > >> The NIC may still win, I didn't walk through the code to confirm, > >> it > >> was enough to suggest to me that setting TSO wouldn't fix this > >> issue. > >> > >> > >> However, this is still a TSO related issue, it's just not one > >> rel
Re: 9.2 ixgbe tx queue hang
On 25.03.2014, at 22:46, Rick Macklem wrote: > Markus Gebert wrote: >> >> On 25.03.2014, at 02:18, Rick Macklem wrote: >> >>> Christopher Forgeron wrote: This is regarding the TSO patch that Rick suggested earlier. (With many thanks for his time and suggestion) As I mentioned earlier, it did not fix the issue on a 10.0 system. It did make it less of a problem on 9.2, but either way, I think it's not needed, and shouldn't be considered as a patch for testing/etc. Patching TSO to anything other than a max value (and by default the code gives it IP_MAXPACKET) is confusing the matter, as the packet length ultimately needs to be adjusted for many things on the fly like TCP Options, etc. Using static header sizes won't be a good idea. >>> If you look at tcp_output(), you'll notice that it doesn't do TSO >>> if >>> there are any options. That way it knows that the TCP/IP header is >>> just hdrlen. >>> >>> If you don't limit the TSO packet (including TCP/IP and ethernet >>> headers) >>> to 64K, then the "ix" driver can't send them, which is the problem >>> you guys are seeing. >>> >>> There are other ways to fix this problem, but they all may >>> introduce >>> issues that reducing if_hw_tsomax by a small amount does not. >>> For example, m_defrag() could be modified to use 4K pagesize >>> clusters, >>> but this might introduce memory fragmentation problems. (I observed >>> what I think are memory fragmentation problems when I switched NFS >>> to use 4K pagesize clusters for large I/O messages.) >>> >>> If setting IP_MAXPACKET to 65518 fixes the problem (no more EFBIG >>> error replies), then that is the size that if_hw_tsomax can be set >>> to (just can't change IP_MAXPACKET, but that is defined for other >>> things). (It just happens that IP_MAXPACKET is what if_hw_tsomax >>> defaults to. It has no other effect w.r.t. TSO.) >>> Additionally, it seems that setting nic TSO will/may be ignored by code like this in sys/netinet/tcp_output.c: >> >> Is this confirmed or still a ‘it seems’? Have you actually seen a >> tp->t_tsomax value in tcp_output() bigger than if_hw_tsomax or was >> this just speculation because the values are stored in different >> places? (Sorry, if you already stated this in another email, it’s >> currently hard to keep track of all the information.) >> >> Anyway, this dtrace one-liner should be a good test if other values >> appear in tp->t_tsomax: >> >> # dtrace -n 'fbt::tcp_output:entry / args[0]->t_tsomax != 0 && >> args[0]->t_tsomax != 65518 / { printf("unexpected tp->t_tsomax: >> %i\n", args[0]->t_tsomax); stack(); }' >> >> Remember to adjust the value in the condition to whatever you’re >> currently expecting. The value seems to be 0 for new connections, >> probably when tcp_mss() has not been called yet. So that’s seems >> normal and I have excluded that case too. This will also print a >> kernel stack trace in case it sees an unexpected value. >> >> >>> Yes, but I don't know why. >>> The only conjecture I can come up with is that another net driver >>> is >>> stacked above "ix" and the setting for if_hw_tsomax doesn't >>> propagate >>> up. (If you look at the commit log message for r251296, the intent >>> of adding if_hw_tsomax was to allow device drivers to set a smaller >>> tsomax than IP_MAXPACKET.) >>> >>> Are you using any of the "stacked" network device drivers like >>> lagg? I don't even know what the others all are? >>> Maybe someone else can list them? >> >> I guess the most obvious are lagg and vlan (and probably carp on >> FreeBSD 9.x or older). >> >> On request from Jack, we’ve eliminated lagg and vlan from the >> picture, which gives us plain ixgbe interfaces with no stacked >> interfaces on top of it. And we can still reproduce the problem. >> > This was related to the "did if_hw_tsomax set tp->t_tsomax to the > same value?" question. Since you reported that my patch that set > if_hw_tsomax in the driver didn't fix the problem, that suggests > that tp->t_tsomax isn't being set to if_hw_tsomax from the driver, > but we don't know why? Jack asked us to remove lagg/vlans in the very beginning of this thread, and when had done that, the problem was still there. So my answer was not related to your recent patch. I wanted to clarify that we have been testing with ixgbe only for quite some time and that stacked interfaces could not be a source of problems in our test scenario. We have just started testing your patch that sets if_hw_tsomax yesterday. So far I have it running on two systems along with some printfs and the dtrace one-liner that watches over tp->t_tsomax in tcp_output(). So far we’ve haven’t had any problems with these two servers, and the dtrace probe never fired, so far it looks like tp->t_tsomax always gets set from if_hw_tsomax. But it’s too soon to make a conclusion, it may take days to trigger the pr
Re: 9.2 ixgbe tx queue hang
Markus Gebert wrote: > > On 25.03.2014, at 22:46, Rick Macklem wrote: > > > Markus Gebert wrote: > >> > >> On 25.03.2014, at 02:18, Rick Macklem > >> wrote: > >> > >>> Christopher Forgeron wrote: > > > > This is regarding the TSO patch that Rick suggested earlier. > (With > many thanks for his time and suggestion) > > > As I mentioned earlier, it did not fix the issue on a 10.0 > system. > It > did make it less of a problem on 9.2, but either way, I think > it's > not needed, and shouldn't be considered as a patch for > testing/etc. > > > Patching TSO to anything other than a max value (and by default > the > code gives it IP_MAXPACKET) is confusing the matter, as the > packet > length ultimately needs to be adjusted for many things on the > fly > like TCP Options, etc. Using static header sizes won't be a good > idea. > > >>> If you look at tcp_output(), you'll notice that it doesn't do TSO > >>> if > >>> there are any options. That way it knows that the TCP/IP header > >>> is > >>> just hdrlen. > >>> > >>> If you don't limit the TSO packet (including TCP/IP and ethernet > >>> headers) > >>> to 64K, then the "ix" driver can't send them, which is the > >>> problem > >>> you guys are seeing. > >>> > >>> There are other ways to fix this problem, but they all may > >>> introduce > >>> issues that reducing if_hw_tsomax by a small amount does not. > >>> For example, m_defrag() could be modified to use 4K pagesize > >>> clusters, > >>> but this might introduce memory fragmentation problems. (I > >>> observed > >>> what I think are memory fragmentation problems when I switched > >>> NFS > >>> to use 4K pagesize clusters for large I/O messages.) > >>> > >>> If setting IP_MAXPACKET to 65518 fixes the problem (no more EFBIG > >>> error replies), then that is the size that if_hw_tsomax can be > >>> set > >>> to (just can't change IP_MAXPACKET, but that is defined for other > >>> things). (It just happens that IP_MAXPACKET is what if_hw_tsomax > >>> defaults to. It has no other effect w.r.t. TSO.) > >>> > > Additionally, it seems that setting nic TSO will/may be ignored > by > code like this in sys/netinet/tcp_output.c: > > >> > >> Is this confirmed or still a ‘it seems’? Have you actually seen a > >> tp->t_tsomax value in tcp_output() bigger than if_hw_tsomax or was > >> this just speculation because the values are stored in different > >> places? (Sorry, if you already stated this in another email, it’s > >> currently hard to keep track of all the information.) > >> > >> Anyway, this dtrace one-liner should be a good test if other > >> values > >> appear in tp->t_tsomax: > >> > >> # dtrace -n 'fbt::tcp_output:entry / args[0]->t_tsomax != 0 && > >> args[0]->t_tsomax != 65518 / { printf("unexpected tp->t_tsomax: > >> %i\n", args[0]->t_tsomax); stack(); }' > >> > >> Remember to adjust the value in the condition to whatever you’re > >> currently expecting. The value seems to be 0 for new connections, > >> probably when tcp_mss() has not been called yet. So that’s seems > >> normal and I have excluded that case too. This will also print a > >> kernel stack trace in case it sees an unexpected value. > >> > >> > >>> Yes, but I don't know why. > >>> The only conjecture I can come up with is that another net driver > >>> is > >>> stacked above "ix" and the setting for if_hw_tsomax doesn't > >>> propagate > >>> up. (If you look at the commit log message for r251296, the > >>> intent > >>> of adding if_hw_tsomax was to allow device drivers to set a > >>> smaller > >>> tsomax than IP_MAXPACKET.) > >>> > >>> Are you using any of the "stacked" network device drivers like > >>> lagg? I don't even know what the others all are? > >>> Maybe someone else can list them? > >> > >> I guess the most obvious are lagg and vlan (and probably carp on > >> FreeBSD 9.x or older). > >> > >> On request from Jack, we’ve eliminated lagg and vlan from the > >> picture, which gives us plain ixgbe interfaces with no stacked > >> interfaces on top of it. And we can still reproduce the problem. > >> > > This was related to the "did if_hw_tsomax set tp->t_tsomax to the > > same value?" question. Since you reported that my patch that set > > if_hw_tsomax in the driver didn't fix the problem, that suggests > > that tp->t_tsomax isn't being set to if_hw_tsomax from the driver, > > but we don't know why? > > Jack asked us to remove lagg/vlans in the very beginning of this > thread, and when had done that, the problem was still there. So my > answer was not related to your recent patch. I wanted to clarify > that we have been testing with ixgbe only for quite some time and > that stacked interfaces could not be a source of problems in our > test scenario. > > We have just started testing your patch that sets if_hw_tsomax > yesterday. So far I have it running on two systems al
Re: 9.2 ixgbe tx queue hang
Update: I'm changing my mind, and I believe Rick's TSO patch is fixing things (sorry). In looking at my notes, it's possible I had lagg on for those tests. lagg does seem to negate the TSO patch in my case. kernel.10stable_basicTSO_65535/ - IP_MAXPACKET = 65535; - manually forced (no if statement) ifp->if_hw_tsomax = IP_MAXPACKET - (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN); - Verified on boot via printf that ifp->if_hw_tsomax = 65517 - Boot in a NON LAGG environment. ix0 only. ixgbe's printf is showing packets up to 65530. Haven't run long enough yet to see if anything will go over 65535 I have this tcpdump running to check packet size. tcpdump -ennvvXS -i ix0 greater 65518 I do expect to get packets over 65518, but I was just curious to see if any of them would go over 65535. Time will tell. In a separate test, If I enable lagg, we have LOTS of oversized packet problems. It looks like tsomax is definitely not making it through in if_lagg.c - Any recommendations there? I will eventually need lagg, as I'm sure will others. With dtrace, it's showing t_tsomax >= 65518. Shouldn't that not be happening? dtrace -n 'fbt::tcp_output:entry / args[0]->t_tsomax != 0 && args[0]->t_tsomax >= 65518 / { printf("unexpected tp->t_tsomax: %i\n", args[0]->t_tsomax); stack(); }' 6 31403 tcp_output:entry unexpected tp->t_tsomax: 65535 kernel`tcp_do_segment+0x2c99 kernel`tcp_input+0x11a2 kernel`ip_input+0xa2 kernel`netisr_dispatch_src+0x5e kernel`ether_demux+0x12a kernel`ether_nh_input+0x35f kernel`netisr_dispatch_src+0x5e kernel`bce_intr+0x765 kernel`intr_event_execute_handlers+0xab kernel`ithread_loop+0x96 kernel`fork_exit+0x9a kernel`0x80c75b2e 3 31403 tcp_output:entry unexpected tp->t_tsomax: 65535 kernel`tcp_do_segment+0x2c99 kernel`tcp_input+0x11a2 kernel`ip_input+0xa2 kernel`netisr_dispatch_src+0x5e kernel`ether_demux+0x12a kernel`ether_nh_input+0x35f kernel`netisr_dispatch_src+0x5e kernel`bce_intr+0x765 kernel`intr_event_execute_handlers+0xab kernel`ithread_loop+0x96 kernel`fork_exit+0x9a kernel`0x80c75b2e 6 31403 tcp_output:entry unexpected tp->t_tsomax: 65535 kernel`tcp_do_segment+0x2c99 kernel`tcp_input+0x11a2 kernel`ip_input+0xa2 kernel`netisr_dispatch_src+0x5e kernel`ether_demux+0x12a kernel`ether_nh_input+0x35f kernel`netisr_dispatch_src+0x5e kernel`bce_intr+0x765 kernel`intr_event_execute_handlers+0xab kernel`ithread_loop+0x96 kernel`fork_exit+0x9a kernel`0x80c75b2e 1 31403 tcp_output:entry unexpected tp->t_tsomax: 65535 kernel`tcp_do_segment+0x2c99 kernel`tcp_input+0x11a2 kernel`ip_input+0xa2 kernel`netisr_dispatch_src+0x5e kernel`ether_demux+0x12a kernel`ether_nh_input+0x35f kernel`netisr_dispatch_src+0x5e kernel`bce_intr+0x765 kernel`intr_event_execute_handlers+0xab kernel`ithread_loop+0x96 kernel`fork_exit+0x9a kernel`0x80c75b2e ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: 9.2 ixgbe tx queue hang
On 25.03.2014, at 23:21, Rick Macklem wrote: > Markus Gebert wrote: >> >> On 25.03.2014, at 22:46, Rick Macklem wrote: >> >>> Markus Gebert wrote: On 25.03.2014, at 02:18, Rick Macklem wrote: > Christopher Forgeron wrote: >> >> >> >> This is regarding the TSO patch that Rick suggested earlier. >> (With >> many thanks for his time and suggestion) >> >> >> As I mentioned earlier, it did not fix the issue on a 10.0 >> system. >> It >> did make it less of a problem on 9.2, but either way, I think >> it's >> not needed, and shouldn't be considered as a patch for >> testing/etc. >> >> >> Patching TSO to anything other than a max value (and by default >> the >> code gives it IP_MAXPACKET) is confusing the matter, as the >> packet >> length ultimately needs to be adjusted for many things on the >> fly >> like TCP Options, etc. Using static header sizes won't be a good >> idea. >> > If you look at tcp_output(), you'll notice that it doesn't do TSO > if > there are any options. That way it knows that the TCP/IP header > is > just hdrlen. > > If you don't limit the TSO packet (including TCP/IP and ethernet > headers) > to 64K, then the "ix" driver can't send them, which is the > problem > you guys are seeing. > > There are other ways to fix this problem, but they all may > introduce > issues that reducing if_hw_tsomax by a small amount does not. > For example, m_defrag() could be modified to use 4K pagesize > clusters, > but this might introduce memory fragmentation problems. (I > observed > what I think are memory fragmentation problems when I switched > NFS > to use 4K pagesize clusters for large I/O messages.) > > If setting IP_MAXPACKET to 65518 fixes the problem (no more EFBIG > error replies), then that is the size that if_hw_tsomax can be > set > to (just can't change IP_MAXPACKET, but that is defined for other > things). (It just happens that IP_MAXPACKET is what if_hw_tsomax > defaults to. It has no other effect w.r.t. TSO.) > >> >> Additionally, it seems that setting nic TSO will/may be ignored >> by >> code like this in sys/netinet/tcp_output.c: >> Is this confirmed or still a ‘it seems’? Have you actually seen a tp->t_tsomax value in tcp_output() bigger than if_hw_tsomax or was this just speculation because the values are stored in different places? (Sorry, if you already stated this in another email, it’s currently hard to keep track of all the information.) Anyway, this dtrace one-liner should be a good test if other values appear in tp->t_tsomax: # dtrace -n 'fbt::tcp_output:entry / args[0]->t_tsomax != 0 && args[0]->t_tsomax != 65518 / { printf("unexpected tp->t_tsomax: %i\n", args[0]->t_tsomax); stack(); }' Remember to adjust the value in the condition to whatever you’re currently expecting. The value seems to be 0 for new connections, probably when tcp_mss() has not been called yet. So that’s seems normal and I have excluded that case too. This will also print a kernel stack trace in case it sees an unexpected value. > Yes, but I don't know why. > The only conjecture I can come up with is that another net driver > is > stacked above "ix" and the setting for if_hw_tsomax doesn't > propagate > up. (If you look at the commit log message for r251296, the > intent > of adding if_hw_tsomax was to allow device drivers to set a > smaller > tsomax than IP_MAXPACKET.) > > Are you using any of the "stacked" network device drivers like > lagg? I don't even know what the others all are? > Maybe someone else can list them? I guess the most obvious are lagg and vlan (and probably carp on FreeBSD 9.x or older). On request from Jack, we’ve eliminated lagg and vlan from the picture, which gives us plain ixgbe interfaces with no stacked interfaces on top of it. And we can still reproduce the problem. >>> This was related to the "did if_hw_tsomax set tp->t_tsomax to the >>> same value?" question. Since you reported that my patch that set >>> if_hw_tsomax in the driver didn't fix the problem, that suggests >>> that tp->t_tsomax isn't being set to if_hw_tsomax from the driver, >>> but we don't know why? >> >> Jack asked us to remove lagg/vlans in the very beginning of this >> thread, and when had done that, the problem was still there. So my >> answer was not related to your recent patch. I wanted to clarify >> that we have been testing with ixgbe only for quite some time and >> that stacked interfaces could not be a source of problems in our >> test scenario. >> >> We have just started testing your patch that sets if_hw_
RFC: How to fix the NFS/iSCSI vs TSO problem
Hi, First off, I hope you don't mind that I cross-posted this, but I wanted to make sure both the NFS/iSCSI and networking types say it. If you look in this mailing list thread: http://docs.FreeBSD.org/cgi/mid.cgi?1850411724.1687820.1395621539316.JavaMail.root you'll see that several people have been working hard at testing and thanks to them, I think I now know what is going on. (This applies to network drivers that support TSO and are limited to 32 transmit segments->32 mbufs in chain.) Doing a quick search I found the following drivers that appear to be affected (I may have missed some): jme, fxp, age, sge, msk, alc, ale, ixgbe/ix, nfe, e1000/em, re Further, of these drivers, the following use m_collapse() and not m_defrag() to try and reduce the # of mbufs in the chain. m_collapse() is not going to get the 35 mbufs down to 32 mbufs, as far as I can see, so these ones are more badly broken: jme, fxp, age, sge, alc, ale, nfe, re The long description is in the above thread, but the short version is: - NFS generates a chain with 35 mbufs in it for (read/readdir replies and write requests) made up of (tcpip header, RPC header, NFS args, 32 clusters of file data) - tcp_output() usually trims the data size down to tp->t_tsomax (65535) and then some more to make it an exact multiple of TCP transmit data size. - the net driver prepends an ethernet header, growing the length by 14 (or sometimes 18 for vlans), but in the first mbuf and not adding one to the chain. - m_defrag() copies this to a chain of 32 mbuf clusters (because the total data length is <= 64K) and it gets sent However, it the data length is a little less than 64K when passed to tcp_output() so that the length including headers is in the range 65519->65535... - tcp_output() doesn't reduce its size. - the net driver adds an ethernet header, making the total data length slightly greater than 64K - m_defrag() copies it to a chain of 33 mbuf clusters, which fails with EFBIG --> trainwrecks NFS performance, because the TSO segment is dropped instead of sent. A tester also stated that the problem could be reproduced using iSCSI. Maybe Edward Napierala might know some details w.r.t. what kind of mbuf chain iSCSI generates? Also, one tester has reported that setting if_hw_tsomax in the driver before the ether_ifattach() call didn't make the value of tp->t_tsomax smaller. However, reducing IP_MAXPACKET (which is what it is set to by default) did reduce it. I have no idea why this happens or how to fix it, but it implies that setting if_hw_tsomax in the driver isn't a solution until this is resolved. So, what to do about this? First, I'd like a simple fix/workaround that can go into 9.3. (which is code freeze in May). The best thing I can think of is setting if_hw_tsomax to a smaller default value. (Line# 658 of sys/net/if.c in head.) Version A: replace ifp->if_hw_tsomax = IP_MAXPACKET; with ifp->if_hw_tsomax = min(32 * MCLBYTES - (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN), IP_MAXPACKET); plus replace m_collapse() with m_defrag() in the drivers listed above. This would only reduce the default from 65535->65518, so it only impacts the uncommon case where the output size (with tcpip header) is within this range. (As such, I don't think it would have a negative impact for drivers that handle more than 32 transmit segments.) >From the testers, it seems that this is sufficient to get rid of the EFBIG errors. (The total data length including ethernet header doesn't exceed 64K, so m_defrag() fits it into 32 mbuf clusters.) The main downside of this is that there will be a lot of m_defrag() calls being done and they do quite a bit of bcopy()'ng. Version B: replace ifp->if_hw_tsomax = IP_MAXPACKET; with ifp->if_hw_tsomax = min(29 * MCLBYTES, IP_MAXPACKET); This one would avoid the m_defrag() calls, but might have a negative impact on TSO performance for drivers that can handle 35 transmit segments, since the maximum TSO segment size is reduced by about 6K. (Because of the second size reduction to an exact multiple of TCP transmit data size, the exact amount varies.) Possible longer term fixes: One longer term fix might be to add something like if_hw_tsomaxseg so that a driver can set a limit on the number of transmit segments (mbufs in chain) and tcp_output() could use that to limit the size of the TSO segment, as required. (I have a first stab at such a patch, but no way to test it, so I can't see that being done by May. Also, it would require changes to a lot of drivers to make it work. I've attached this patch, in case anyone wants to work on it?) Another might be to increase the size of MCLBYTES (I don't see this as practical for 9.3, although the actual change is simple.). I do think that increasing MCLBYTES might be something to consider doing in the future, for reasons beyond fixing this. So, what do others think should be done? rick ___ freebsd-net@f
Re: 9.2 ixgbe tx queue hang
Christopher Forgeron wrote: > Update: > > I'm changing my mind, and I believe Rick's TSO patch is fixing > things > (sorry). In looking at my notes, it's possible I had lagg on for > those > tests. lagg does seem to negate the TSO patch in my case. > Ok, that's useful information. It implies that r251296 doesn't quite work and needs to be fixed for "stacked" network interface drivers before it can be used. I've cc'd Andre who is the author of that patch, in case he knows how to fix it. Thanks for checking this, rick > kernel.10stable_basicTSO_65535/ > > - IP_MAXPACKET = 65535; > - manually forced (no if statement) ifp->if_hw_tsomax = IP_MAXPACKET > - > (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN); > - Verified on boot via printf that ifp->if_hw_tsomax = 65517 > - Boot in a NON LAGG environment. ix0 only. > > ixgbe's printf is showing packets up to 65530. Haven't run long > enough yet > to see if anything will go over 65535 > > I have this tcpdump running to check packet size. > tcpdump -ennvvXS -i ix0 greater 65518 > > I do expect to get packets over 65518, but I was just curious to see > if any > of them would go over 65535. Time will tell. > > In a separate test, If I enable lagg, we have LOTS of oversized > packet > problems. It looks like tsomax is definitely not making it through in > if_lagg.c - Any recommendations there? I will eventually need lagg, > as I'm > sure will others. > > With dtrace, it's showing t_tsomax >= 65518. Shouldn't that not be > happening? > > > dtrace -n 'fbt::tcp_output:entry / args[0]->t_tsomax != 0 && > args[0]->t_tsomax >= 65518 / { printf("unexpected tp->t_tsomax: > %i\n", > args[0]->t_tsomax); stack(); }' > > > 6 31403 tcp_output:entry unexpected tp->t_tsomax: > 65535 > > kernel`tcp_do_segment+0x2c99 > kernel`tcp_input+0x11a2 > kernel`ip_input+0xa2 > kernel`netisr_dispatch_src+0x5e > kernel`ether_demux+0x12a > kernel`ether_nh_input+0x35f > kernel`netisr_dispatch_src+0x5e > kernel`bce_intr+0x765 > kernel`intr_event_execute_handlers+0xab > kernel`ithread_loop+0x96 > kernel`fork_exit+0x9a > kernel`0x80c75b2e > > 3 31403 tcp_output:entry unexpected tp->t_tsomax: > 65535 > > kernel`tcp_do_segment+0x2c99 > kernel`tcp_input+0x11a2 > kernel`ip_input+0xa2 > kernel`netisr_dispatch_src+0x5e > kernel`ether_demux+0x12a > kernel`ether_nh_input+0x35f > kernel`netisr_dispatch_src+0x5e > kernel`bce_intr+0x765 > kernel`intr_event_execute_handlers+0xab > kernel`ithread_loop+0x96 > kernel`fork_exit+0x9a > kernel`0x80c75b2e > > 6 31403 tcp_output:entry unexpected tp->t_tsomax: > 65535 > > kernel`tcp_do_segment+0x2c99 > kernel`tcp_input+0x11a2 > kernel`ip_input+0xa2 > kernel`netisr_dispatch_src+0x5e > kernel`ether_demux+0x12a > kernel`ether_nh_input+0x35f > kernel`netisr_dispatch_src+0x5e > kernel`bce_intr+0x765 > kernel`intr_event_execute_handlers+0xab > kernel`ithread_loop+0x96 > kernel`fork_exit+0x9a > kernel`0x80c75b2e > > 1 31403 tcp_output:entry unexpected tp->t_tsomax: > 65535 > > kernel`tcp_do_segment+0x2c99 > kernel`tcp_input+0x11a2 > kernel`ip_input+0xa2 > kernel`netisr_dispatch_src+0x5e > kernel`ether_demux+0x12a > kernel`ether_nh_input+0x35f > kernel`netisr_dispatch_src+0x5e > kernel`bce_intr+0x765 > kernel`intr_event_execute_handlers+0xab > kernel`ithread_loop+0x96 > kernel`fork_exit+0x9a > kernel`0x80c75b2e > ___ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to > "freebsd-net-unsubscr...@freebsd.org" > ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: 9.2 ixgbe tx queue hang
On 26.03.2014, at 00:06, Christopher Forgeron wrote: > Update: > > I'm changing my mind, and I believe Rick's TSO patch is fixing things > (sorry). In looking at my notes, it's possible I had lagg on for those > tests. lagg does seem to negate the TSO patch in my case. I’m glad to hear you could check that scenario again. In the other email I just sent, I just asked you to redo this test. Now it makes perfect sense why you saw oversized packets despite Rick’s if_hw_tsomax patch. > kernel.10stable_basicTSO_65535/ > > - IP_MAXPACKET = 65535; > - manually forced (no if statement) ifp->if_hw_tsomax = IP_MAXPACKET - > (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN); > - Verified on boot via printf that ifp->if_hw_tsomax = 65517 Is 65517 correct? With Ricks patch, I get this: dev.ix.0.hw_tsomax: 65518 Also the dtrace command you used excludes 65518... > - Boot in a NON LAGG environment. ix0 only. > > ixgbe's printf is showing packets up to 65530. Haven't run long enough yet > to see if anything will go over 65535 > > I have this tcpdump running to check packet size. > tcpdump -ennvvXS -i ix0 greater 65518 > > I do expect to get packets over 65518, but I was just curious to see if any > of them would go over 65535. Time will tell. > > In a separate test, If I enable lagg, we have LOTS of oversized packet > problems. It looks like tsomax is definitely not making it through in > if_lagg.c - Any recommendations there? I will eventually need lagg, as I'm > sure will others. I think somebody has to invent a way to propagate if_hw_maxtso to interfaces on top of each other. > With dtrace, it's showing t_tsomax >= 65518. Shouldn't that not be > happening? Looks like these all come from bce interfaces (bce_intr in the stack trace), which probably have another value for if_hw_tsomax. Markus > dtrace -n 'fbt::tcp_output:entry / args[0]->t_tsomax != 0 && > args[0]->t_tsomax >= 65518 / { printf("unexpected tp->t_tsomax: %i\n", > args[0]->t_tsomax); stack(); }' > > > 6 31403 tcp_output:entry unexpected tp->t_tsomax: 65535 > > kernel`tcp_do_segment+0x2c99 > kernel`tcp_input+0x11a2 > kernel`ip_input+0xa2 > kernel`netisr_dispatch_src+0x5e > kernel`ether_demux+0x12a > kernel`ether_nh_input+0x35f > kernel`netisr_dispatch_src+0x5e > kernel`bce_intr+0x765 > kernel`intr_event_execute_handlers+0xab > kernel`ithread_loop+0x96 > kernel`fork_exit+0x9a > kernel`0x80c75b2e > > 3 31403 tcp_output:entry unexpected tp->t_tsomax: 65535 > > kernel`tcp_do_segment+0x2c99 > kernel`tcp_input+0x11a2 > kernel`ip_input+0xa2 > kernel`netisr_dispatch_src+0x5e > kernel`ether_demux+0x12a > kernel`ether_nh_input+0x35f > kernel`netisr_dispatch_src+0x5e > kernel`bce_intr+0x765 > kernel`intr_event_execute_handlers+0xab > kernel`ithread_loop+0x96 > kernel`fork_exit+0x9a > kernel`0x80c75b2e > > 6 31403 tcp_output:entry unexpected tp->t_tsomax: 65535 > > kernel`tcp_do_segment+0x2c99 > kernel`tcp_input+0x11a2 > kernel`ip_input+0xa2 > kernel`netisr_dispatch_src+0x5e > kernel`ether_demux+0x12a > kernel`ether_nh_input+0x35f > kernel`netisr_dispatch_src+0x5e > kernel`bce_intr+0x765 > kernel`intr_event_execute_handlers+0xab > kernel`ithread_loop+0x96 > kernel`fork_exit+0x9a > kernel`0x80c75b2e > > 1 31403 tcp_output:entry unexpected tp->t_tsomax: 65535 > > kernel`tcp_do_segment+0x2c99 > kernel`tcp_input+0x11a2 > kernel`ip_input+0xa2 > kernel`netisr_dispatch_src+0x5e > kernel`ether_demux+0x12a > kernel`ether_nh_input+0x35f > kernel`netisr_dispatch_src+0x5e > kernel`bce_intr+0x765 > kernel`intr_event_execute_handlers+0xab > kernel`ithread_loop+0x96 > kernel`fork_exit+0x9a > kernel`0x80c75b2e > ___ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" > ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: 9.2 ixgbe tx queue hang
Markus Gebert wrote: > > On 26.03.2014, at 00:06, Christopher Forgeron > wrote: > > > Update: > > > > I'm changing my mind, and I believe Rick's TSO patch is fixing > > things > > (sorry). In looking at my notes, it's possible I had lagg on for > > those > > tests. lagg does seem to negate the TSO patch in my case. > > I’m glad to hear you could check that scenario again. In the other > email I just sent, I just asked you to redo this test. Now it makes > perfect sense why you saw oversized packets despite Rick’s > if_hw_tsomax patch. > > > > kernel.10stable_basicTSO_65535/ > > > > - IP_MAXPACKET = 65535; > > - manually forced (no if statement) ifp->if_hw_tsomax = > > IP_MAXPACKET - > > (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN); > > - Verified on boot via printf that ifp->if_hw_tsomax = 65517 > > Is 65517 correct? With Ricks patch, I get this: > > dev.ix.0.hw_tsomax: 65518 > > Also the dtrace command you used excludes 65518... > I am using 32 * MCLBYTES - (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN) which is 65518. Although IP_MAXPACKET (maximum IP len, not including ethernet header) is 65535 (largest # that fits in 16bits), the maximum data length (including ethernet header) that will fit in 32 mbuf clusters is 65536. (In practice 65517 or anything <= 65518 should fix the problem.) rick > > - Boot in a NON LAGG environment. ix0 only. > > > > ixgbe's printf is showing packets up to 65530. Haven't run long > > enough yet > > to see if anything will go over 65535 > > With the ethernet header length, it can be <= 65536, because that is 32 * MCLBYTES. rick > > I have this tcpdump running to check packet size. > > tcpdump -ennvvXS -i ix0 greater 65518 > > > > I do expect to get packets over 65518, but I was just curious to > > see if any > > of them would go over 65535. Time will tell. > > > > In a separate test, If I enable lagg, we have LOTS of oversized > > packet > > problems. It looks like tsomax is definitely not making it through > > in > > if_lagg.c - Any recommendations there? I will eventually need lagg, > > as I'm > > sure will others. > > I think somebody has to invent a way to propagate if_hw_maxtso to > interfaces on top of each other. > > > > With dtrace, it's showing t_tsomax >= 65518. Shouldn't that not be > > happening? > > Looks like these all come from bce interfaces (bce_intr in the stack > trace), which probably have another value for if_hw_tsomax. > > > Markus > > > > dtrace -n 'fbt::tcp_output:entry / args[0]->t_tsomax != 0 && > > args[0]->t_tsomax >= 65518 / { printf("unexpected tp->t_tsomax: > > %i\n", > > args[0]->t_tsomax); stack(); }' > > > > > > 6 31403 tcp_output:entry unexpected tp->t_tsomax: > > 65535 > > > > kernel`tcp_do_segment+0x2c99 > > kernel`tcp_input+0x11a2 > > kernel`ip_input+0xa2 > > kernel`netisr_dispatch_src+0x5e > > kernel`ether_demux+0x12a > > kernel`ether_nh_input+0x35f > > kernel`netisr_dispatch_src+0x5e > > kernel`bce_intr+0x765 > > kernel`intr_event_execute_handlers+0xab > > kernel`ithread_loop+0x96 > > kernel`fork_exit+0x9a > > kernel`0x80c75b2e > > > > 3 31403 tcp_output:entry unexpected tp->t_tsomax: > > 65535 > > > > kernel`tcp_do_segment+0x2c99 > > kernel`tcp_input+0x11a2 > > kernel`ip_input+0xa2 > > kernel`netisr_dispatch_src+0x5e > > kernel`ether_demux+0x12a > > kernel`ether_nh_input+0x35f > > kernel`netisr_dispatch_src+0x5e > > kernel`bce_intr+0x765 > > kernel`intr_event_execute_handlers+0xab > > kernel`ithread_loop+0x96 > > kernel`fork_exit+0x9a > > kernel`0x80c75b2e > > > > 6 31403 tcp_output:entry unexpected tp->t_tsomax: > > 65535 > > > > kernel`tcp_do_segment+0x2c99 > > kernel`tcp_input+0x11a2 > > kernel`ip_input+0xa2 > > kernel`netisr_dispatch_src+0x5e > > kernel`ether_demux+0x12a > > kernel`ether_nh_input+0x35f > > kernel`netisr_dispatch_src+0x5e > > kernel`bce_intr+0x765 > > kernel`intr_event_execute_handlers+0xab > > kernel`ithread_loop+0x96 > > kernel`fork_exit+0x9a > > kernel`0x80c75b2e > > > > 1 31403 tcp_output:entry unexpected tp->t_tsomax: > > 65535 > > > > kernel`tcp_do_segment+0x2c99 > > kernel`tcp_input+0x11a2 > > kernel`ip_input+0xa2 > > kernel`netisr_dispatch_src+0x5e > > kernel`ether_demux+0x12a > > kernel`ether_nh_input+0x35f > > kernel`netisr_dispatch_src+0x5e > > kernel`bce_intr+0x765 > > kernel`intr_event_execute_handlers+0xab
Re: 9.2 ixgbe tx queue hang
That's interesting. I see here in the r251296 commit Andre says : Drivers can set ifp->if_hw_tsomax before calling ether_ifattach() to change the limit. I wonder if we add your same TSO patch to if_lagg.c before line 356's ether_ifattach() will fix it. Ultimately, it will need to load the if_hw_tsomax from the if below it - but then again, if the calculation for ixgbe is good enough for that driver, why wouldn't it be good enough for lagg? Unless people think I'm crazy, I'll compile that in at line 356 in if_lagg.c and give it a test run tomorrow. This may need to go into vlan and carp as well, I'm not sure yet. On Tue, Mar 25, 2014 at 8:16 PM, Rick Macklem wrote: > Christopher Forgeron wrote: > > Update: > > > > I'm changing my mind, and I believe Rick's TSO patch is fixing > > things > > (sorry). In looking at my notes, it's possible I had lagg on for > > those > > tests. lagg does seem to negate the TSO patch in my case. > > > Ok, that's useful information. It implies that r251296 doesn't quite > work and needs to be fixed for "stacked" network interface drivers > before it can be used. I've cc'd Andre who is the author of that > patch, in case he knows how to fix it. > > Thanks for checking this, rick > > > kernel.10stable_basicTSO_65535/ > > > > - IP_MAXPACKET = 65535; > > - manually forced (no if statement) ifp->if_hw_tsomax = IP_MAXPACKET > > - > > (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN); > > - Verified on boot via printf that ifp->if_hw_tsomax = 65517 > > - Boot in a NON LAGG environment. ix0 only. > > > > ixgbe's printf is showing packets up to 65530. Haven't run long > > enough yet > > to see if anything will go over 65535 > > > > I have this tcpdump running to check packet size. > > tcpdump -ennvvXS -i ix0 greater 65518 > > > > I do expect to get packets over 65518, but I was just curious to see > > if any > > of them would go over 65535. Time will tell. > > > > In a separate test, If I enable lagg, we have LOTS of oversized > > packet > > problems. It looks like tsomax is definitely not making it through in > > if_lagg.c - Any recommendations there? I will eventually need lagg, > > as I'm > > sure will others. > > > > With dtrace, it's showing t_tsomax >= 65518. Shouldn't that not be > > happening? > > > > > > dtrace -n 'fbt::tcp_output:entry / args[0]->t_tsomax != 0 && > > args[0]->t_tsomax >= 65518 / { printf("unexpected tp->t_tsomax: > > %i\n", > > args[0]->t_tsomax); stack(); }' > > > > > > 6 31403 tcp_output:entry unexpected tp->t_tsomax: > > 65535 > > > > kernel`tcp_do_segment+0x2c99 > > kernel`tcp_input+0x11a2 > > kernel`ip_input+0xa2 > > kernel`netisr_dispatch_src+0x5e > > kernel`ether_demux+0x12a > > kernel`ether_nh_input+0x35f > > kernel`netisr_dispatch_src+0x5e > > kernel`bce_intr+0x765 > > kernel`intr_event_execute_handlers+0xab > > kernel`ithread_loop+0x96 > > kernel`fork_exit+0x9a > > kernel`0x80c75b2e > > > > 3 31403 tcp_output:entry unexpected tp->t_tsomax: > > 65535 > > > > kernel`tcp_do_segment+0x2c99 > > kernel`tcp_input+0x11a2 > > kernel`ip_input+0xa2 > > kernel`netisr_dispatch_src+0x5e > > kernel`ether_demux+0x12a > > kernel`ether_nh_input+0x35f > > kernel`netisr_dispatch_src+0x5e > > kernel`bce_intr+0x765 > > kernel`intr_event_execute_handlers+0xab > > kernel`ithread_loop+0x96 > > kernel`fork_exit+0x9a > > kernel`0x80c75b2e > > > > 6 31403 tcp_output:entry unexpected tp->t_tsomax: > > 65535 > > > > kernel`tcp_do_segment+0x2c99 > > kernel`tcp_input+0x11a2 > > kernel`ip_input+0xa2 > > kernel`netisr_dispatch_src+0x5e > > kernel`ether_demux+0x12a > > kernel`ether_nh_input+0x35f > > kernel`netisr_dispatch_src+0x5e > > kernel`bce_intr+0x765 > > kernel`intr_event_execute_handlers+0xab > > kernel`ithread_loop+0x96 > > kernel`fork_exit+0x9a > > kernel`0x80c75b2e > > > > 1 31403 tcp_output:entry unexpected tp->t_tsomax: > > 65535 > > > > kernel`tcp_do_segment+0x2c99 > > kernel`tcp_input+0x11a2 > > kernel`ip_input+0xa2 > > kernel`netisr_dispatch_src+0x5e > > kernel`ether_demux+0x12a > > kernel`ether_nh_input+0x35f > > kernel`netisr_dispatch_src+0x5e > > kernel`bce_intr+0x765 > > kernel`intr_event_execute_handlers+0xab > > kernel`ithread_loop+0x96 > > kernel`fork_exit+0x9a > > kernel`0x80c75b
Re: RFC: How to fix the NFS/iSCSI vs TSO problem
On Tue, Mar 25, 2014 at 07:10:35PM -0400, Rick Macklem wrote: > Hi, > > First off, I hope you don't mind that I cross-posted this, > but I wanted to make sure both the NFS/iSCSI and networking > types say it. > If you look in this mailing list thread: > > http://docs.FreeBSD.org/cgi/mid.cgi?1850411724.1687820.1395621539316.JavaMail.root > you'll see that several people have been working hard at testing and > thanks to them, I think I now know what is going on. Thanks for your hard work on narrowing down that issue. I'm too busy for $work in these days so I couldn't find time to investigate the issue. > (This applies to network drivers that support TSO and are limited to 32 > transmit > segments->32 mbufs in chain.) Doing a quick search I found the following > drivers that appear to be affected (I may have missed some): > jme, fxp, age, sge, msk, alc, ale, ixgbe/ix, nfe, e1000/em, re > The magic number 32 was chosen long time ago when I implemented TSO in non-Intel drivers. I tried to find optimal number to reduce the size kernel stack usage at that time. bus_dma(9) will coalesce with previous segment if possible so I thought the number 32 was not an issue. Not sure current bus_dma(9) also has the same code though. The number 32 is arbitrary one so you can increase it if you want. > Further, of these drivers, the following use m_collapse() and not m_defrag() > to try and reduce the # of mbufs in the chain. m_collapse() is not going to > get the 35 mbufs down to 32 mbufs, as far as I can see, so these ones are > more badly broken: > jme, fxp, age, sge, alc, ale, nfe, re I guess m_defeg(9) is more optimized for non-TSO packets. You don't want to waste CPU cycles to copy the full frame to reduce the number of mbufs in the chain. For TSO packets, m_defrag(9) looks better but if we always have to copy a full TSO packet to make TSO work, driver writers have to invent better scheme rather than blindly relying on m_defrag(9), I guess. > > The long description is in the above thread, but the short version is: > - NFS generates a chain with 35 mbufs in it for (read/readdir replies and > write requests) > made up of (tcpip header, RPC header, NFS args, 32 clusters of file data) > - tcp_output() usually trims the data size down to tp->t_tsomax (65535) and > then some more to make it an exact multiple of TCP transmit data size. > - the net driver prepends an ethernet header, growing the length by 14 (or > sometimes 18 for vlans), but in the first mbuf and not adding one to the > chain. > - m_defrag() copies this to a chain of 32 mbuf clusters (because the total > data > length is <= 64K) and it gets sent > > However, it the data length is a little less than 64K when passed to > tcp_output() > so that the length including headers is in the range 65519->65535... > - tcp_output() doesn't reduce its size. > - the net driver adds an ethernet header, making the total data length > slightly > greater than 64K > - m_defrag() copies it to a chain of 33 mbuf clusters, which fails with > EFBIG > --> trainwrecks NFS performance, because the TSO segment is dropped instead > of sent. > > A tester also stated that the problem could be reproduced using iSCSI. Maybe > Edward Napierala might know some details w.r.t. what kind of mbuf chain iSCSI > generates? > > Also, one tester has reported that setting if_hw_tsomax in the driver before > the ether_ifattach() call didn't make the value of tp->t_tsomax smaller. > However, reducing IP_MAXPACKET (which is what it is set to by default) did > reduce it. I have no idea why this happens or how to fix it, but it implies > that setting if_hw_tsomax in the driver isn't a solution until this is > resolved. > > So, what to do about this? > First, I'd like a simple fix/workaround that can go into 9.3. (which is code > freeze in May). The best thing I can think of is setting if_hw_tsomax to a > smaller default value. (Line# 658 of sys/net/if.c in head.) > > Version A: > replace > ifp->if_hw_tsomax = IP_MAXPACKET; > with > ifp->if_hw_tsomax = min(32 * MCLBYTES - (ETHER_HDR_LEN + > ETHER_VLAN_ENCAP_LEN), > IP_MAXPACKET); > plus > replace m_collapse() with m_defrag() in the drivers listed above. > > This would only reduce the default from 65535->65518, so it only impacts > the uncommon case where the output size (with tcpip header) is within > this range. (As such, I don't think it would have a negative impact for > drivers that handle more than 32 transmit segments.) > From the testers, it seems that this is sufficient to get rid of the EFBIG > errors. (The total data length including ethernet header doesn't exceed 64K, > so m_defrag() fits it into 32 mbuf clusters.) > > The main downside of this is that there will be a lot of m_defrag() calls > being done and they do quite a bit of bcopy()'ng. > > Version B: > replace > ifp->if_hw_tsomax = IP_MAXPACKET; > with > ifp->if_hw_tsomax = min(29 * MCLBYTES, IP_MAXPACKET); > > Th
Re: 9.2 ixgbe tx queue hang
On Tue, Mar 25, 2014 at 8:21 PM, Markus Gebert wrote: > > > Is 65517 correct? With Ricks patch, I get this: > > dev.ix.0.hw_tsomax: 65518 > Perhaps a difference between 9.2 and 10 for one of the macros? My code is: ifp->if_hw_tsomax = IP_MAXPACKET - (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN); printf("CSF - 3 Init, ifp->if_hw_tsomax = %d\n", ifp->if_hw_tsomax); (BTW, you should submit the hw_tsomax sysctl patch, that's useful to others) > Also the dtrace command you used excludes 65518... > Oh, I thought it was giving every packet that is greater than or equal to 65518 - Could you show me the proper command? That's the third time I've used dtrace, so I'm making this up as I go. :-) ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: syslogd:sendto: no buffer available on 10-stable
Thanks, Christopher. But I think my problem may does not related to TSO issue. I have tried disable tso with "ifconfig igb(x) -tso" and ovserved with "netstat -ihw 1", and found "oErrs" does not disappeared. Regards Simon 于 14-3-25 22:08, Christopher Forgeron 写道: Hi Simon, Try checking out the "9.2 ixgbe tx queue hang' thread here, and see if it applies to you. On Tue, Mar 25, 2014 at 1:55 AM, k simon mailto:chio1...@gmail.com>> wrote: Hi,Lists: I have got lots of "no buffer available" on 10-stable with igb nic. But em and bce works well. And I tried force igb to 4 or 8 queues and set hw.igb.rx_process_limit="-1", but have nothing helped. Regards Simon # uname -a FreeBSD sq-l1-n2 10.0-STABLE FreeBSD 10.0-STABLE #0 r262469: Tue Feb 25 13:27:11 CST 2014 root@sq-l1-n2:/usr/obj/usr/src/sys/stable-10-262458 amd64 # netstat -mb 19126/73289/92415 mbufs in use (current/cache/total) 13289/46841/60130/524288 mbuf clusters in use (current/cache/total/max) 13289/46812 mbuf+clusters out of packet secondary zone in use (current/cache) 5638/22605/28243/262144 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/77672 9k jumbo clusters in use (current/cache/total/max) 0/0/0/43690 16k jumbo clusters in use (current/cache/total/max) 53914K/202424K/256338K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile # netstat -di NameMtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll Drop igb0 1500 00:1b:21:70:5f:80 17212101113 1355809 0 19612978862 0 0 igb1 1500 00:1b:21:70:5f:81 76601294282 81162751 0 74236432686 0 0 lo0 16384 20532742636 0 0 20522475797 0 0 lo0 - your-net localhost 2736994243 - - 20520227166 - - # sysctl hw.igb hw.igb.rxd: 2048 hw.igb.txd: 2048 hw.igb.enable_aim: 1 hw.igb.enable_msix: 1 hw.igb.max_interrupt_rate: 12000 hw.igb.buf_ring_size: 4096 hw.igb.header_split: 0 hw.igb.num_queues: 1 hw.igb.rx_process_limit: 1000 # sysctl dev.igb.1 dev.igb.1.%desc: Intel(R) PRO/1000 Network Connection version - 2.4.0 dev.igb.1.%driver: igb dev.igb.1.%location: slot=0 function=1 dev.igb.1.%pnpinfo: vendor=0x8086 device=0x10c9 subvendor=0x8086 subdevice=0xa04c class=0x02 dev.igb.1.%parent: pci8 dev.igb.1.nvm: -1 dev.igb.1.enable_aim: 1 dev.igb.1.fc: 0 dev.igb.1.rx_processing_limit: 4096 dev.igb.1.link_irq: 3 dev.igb.1.dropped: 0 dev.igb.1.tx_dma_fail: 0 dev.igb.1.rx_overruns: 0 dev.igb.1.watchdog_timeouts: 0 dev.igb.1.device_control: 1086325313 dev.igb.1.rx_control: 67141634 dev.igb.1.interrupt_mask: 4 dev.igb.1.extended_int_mask: 2147483651 dev.igb.1.tx_buf_alloc: 0 dev.igb.1.rx_buf_alloc: 0 dev.igb.1.fc_high_water: 58976 dev.igb.1.fc_low_water: 58960 dev.igb.1.queue0.no_desc_avail: 10874 dev.igb.1.queue0.tx_packets: 74509997338 dev.igb.1.queue0.rx_packets: 76837720630 dev.igb.1.queue0.rx_bytes: 35589607860237 dev.igb.1.queue0.lro_queued: 0 dev.igb.1.queue0.lro_flushed: 0 dev.igb.1.mac_stats.excess_coll: 0 dev.igb.1.mac_stats.single_coll: 0 dev.igb.1.mac_stats.multiple_coll: 0 dev.igb.1.mac_stats.late_coll: 0 dev.igb.1.mac_stats.collision_count: 0 dev.igb.1.mac_stats.symbol_errors: 0 dev.igb.1.mac_stats.sequence_errors: 0 dev.igb.1.mac_stats.defer_count: 0 dev.igb.1.mac_stats.missed_packets: 81162751 dev.igb.1.mac_stats.recv_no_buff: 176691324 dev.igb.1.mac_stats.recv_undersize: 0 dev.igb.1.mac_stats.recv_fragmented: 0 dev.igb.1.mac_stats.recv_oversize: 0 dev.igb.1.mac_stats.recv_jabber: 0 dev.igb.1.mac_stats.recv_errs: 0 dev.igb.1.mac_stats.crc_errs: 0 dev.igb.1.mac_stats.alignment_errs: 0 dev.igb.1.mac_stats.coll_ext_errs: 0 dev.igb.1.mac_stats.xon_recvd: 0 dev.igb.1.mac_stats.xon_txd: 0 dev.igb.1.mac_stats.xoff_recvd: 0 dev.igb.1.mac_stats.xoff_txd: 0 dev.igb.1.mac_stats.total_pkts_recvd: 76925709917 dev.igb.1.mac_stats.good_pkts_recvd: 76837704301 dev.igb.1.mac_stats.bcast_pkts_recvd: 49174716 dev.igb.1.mac_stats.mcast_pkts_recvd: 282670 dev.igb.1.mac_stats.rx_frames_64: 31057121854 dev.igb.1.mac_stats.rx_frames_65_127: 19996324498 dev.igb.1.mac_stats.rx_frames_128_255: 1171960837 dev.igb.1.mac_stats.rx_frames_256_511: 2295894674 dev.igb.1.mac_stats.rx_frames_512_1023: 2026241