Re: 9.2 ixgbe tx queue hang

2014-04-02 Thread Rick Macklem
K Simon wrote: > Hi, Rick, >Does these patches will commit to the stable soon, or I had to >patch > it manually? > Yonghyeon Pyun has already committed the changes for the drivers to head (making them handle 35 mbufs in the chain instead of 32). I'll assume those will be in stable in a cou

Re: 9.2 ixgbe tx queue hang

2014-04-01 Thread k simon
Hi, Rick, Does these patches will commit to the stable soon, or I had to patch it manually? Regards Simon 于 14-3-28 6:44, Rick Macklem 写道: Christopher Forgeron wrote: On Wed, Mar 26, 2014 at 9:35 PM, Rick Macklem < rmack...@uoguelph.ca wrote: I've suggested in the other thread

Re: 9.2 ixgbe tx queue hang

2014-03-27 Thread Rick Macklem
Christopher Forgeron wrote: > > > > > > > On Wed, Mar 26, 2014 at 9:35 PM, Rick Macklem < rmack...@uoguelph.ca > > wrote: > > > > > I've suggested in the other thread what you suggested in a recent > post...ie. to change the default, at least until the propagation > of driver set values i

Re: 9.2 ixgbe tx queue hang

2014-03-27 Thread Markus Gebert
On 26.03.2014, at 03:33, Christopher Forgeron wrote: > On Tue, Mar 25, 2014 at 8:21 PM, Markus Gebert > wrote: > >> >> >> Is 65517 correct? With Ricks patch, I get this: >> >> dev.ix.0.hw_tsomax: 65518 >> > > Perhaps a difference between 9.2 and 10 for one of the macros? My code is: > >

Re: 9.2 ixgbe tx queue hang

2014-03-27 Thread Christopher Forgeron
On Wed, Mar 26, 2014 at 9:35 PM, Rick Macklem wrote: > > > I've suggested in the other thread what you suggested in a recent > post...ie. to change the default, at least until the propagation > of driver set values is resolved. > > rick > I wonder if we need to worry about propagating values up

Re: 9.2 ixgbe tx queue hang

2014-03-27 Thread Christopher Forgeron
On Wed, Mar 26, 2014 at 9:31 PM, Rick Macklem wrote: > > ie. I've suggested: > ifp->if_hw_tsomax = min(32 * MCLBYTES - (ETHER_HDR_LEN + > ETHER_VLAN_ENCAP_LEN), > IP_MAXPACKET); > - I put the min() in just so it wouldn't break if MCLBYTES is increased > someday. > I like the added s

Re: 9.2 ixgbe tx queue hang

2014-03-26 Thread Rick Macklem
Christopher Forgeron wrote: > That's interesting. I see here in the r251296 commit Andre says : > > Drivers can set ifp->if_hw_tsomax before calling ether_ifattach() > to > change the limit. > > I wonder if we add your same TSO patch to if_lagg.c before line > 356's > ether_ifattach() wil

Re: 9.2 ixgbe tx queue hang

2014-03-26 Thread Rick Macklem
Christopher Forgeron wrote: > > > > > > On Tue, Mar 25, 2014 at 8:21 PM, Markus Gebert < > markus.geb...@hostpoint.ch > wrote: > > > > > > Is 65517 correct? With Ricks patch, I get this: > > dev.ix.0.hw_tsomax: 65518 > > > > Perhaps a difference between 9.2 and 10 for one of the macro

Re: 9.2 ixgbe tx queue hang

2014-03-26 Thread Christopher Forgeron
Confirmed that adding this to sys/net/if.c fixes the issue for lagg as well as ixgbe. 660:if (ifp->if_hw_tsomax == 0) 661:ifp->if_hw_tsomax = IP_MAXPACKET - (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN); Code before (looks to be introduced in 9.2, r251296 as Rick mentions above) just

Re: 9.2 ixgbe tx queue hang

2014-03-26 Thread Christopher Forgeron
Up for almost 19 hours under load without a single error. I would say the TSO patch does work, now I'm going to run lagg tests. The more I think of it, the more I wonder if setting tsomax in if.c at line 660 isn't the better idea, like below. 660:if (ifp->if_hw_tsomax == 0) 661:

Re: 9.2 ixgbe tx queue hang

2014-03-25 Thread Christopher Forgeron
On Tue, Mar 25, 2014 at 8:21 PM, Markus Gebert wrote: > > > Is 65517 correct? With Ricks patch, I get this: > > dev.ix.0.hw_tsomax: 65518 > Perhaps a difference between 9.2 and 10 for one of the macros? My code is: ifp->if_hw_tsomax = IP_MAXPACKET - (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN);

Re: 9.2 ixgbe tx queue hang

2014-03-25 Thread Christopher Forgeron
That's interesting. I see here in the r251296 commit Andre says : Drivers can set ifp->if_hw_tsomax before calling ether_ifattach() to change the limit. I wonder if we add your same TSO patch to if_lagg.c before line 356's ether_ifattach() will fix it. Ultimately, it will need to load the

Re: 9.2 ixgbe tx queue hang

2014-03-25 Thread Rick Macklem
Markus Gebert wrote: > > On 26.03.2014, at 00:06, Christopher Forgeron > wrote: > > > Update: > > > > I'm changing my mind, and I believe Rick's TSO patch is fixing > > things > > (sorry). In looking at my notes, it's possible I had lagg on for > > those > > tests. lagg does seem to negate the

Re: 9.2 ixgbe tx queue hang

2014-03-25 Thread Markus Gebert
On 26.03.2014, at 00:06, Christopher Forgeron wrote: > Update: > > I'm changing my mind, and I believe Rick's TSO patch is fixing things > (sorry). In looking at my notes, it's possible I had lagg on for those > tests. lagg does seem to negate the TSO patch in my case. I’m glad to hear you co

Re: 9.2 ixgbe tx queue hang

2014-03-25 Thread Rick Macklem
Christopher Forgeron wrote: > Update: > > I'm changing my mind, and I believe Rick's TSO patch is fixing > things > (sorry). In looking at my notes, it's possible I had lagg on for > those > tests. lagg does seem to negate the TSO patch in my case. > Ok, that's useful information. It implies t

Re: 9.2 ixgbe tx queue hang

2014-03-25 Thread Markus Gebert
On 25.03.2014, at 23:21, Rick Macklem wrote: > Markus Gebert wrote: >> >> On 25.03.2014, at 22:46, Rick Macklem wrote: >> >>> Markus Gebert wrote: On 25.03.2014, at 02:18, Rick Macklem wrote: > Christopher Forgeron wrote: >> >> >> >> This is regar

Re: 9.2 ixgbe tx queue hang

2014-03-25 Thread Christopher Forgeron
Update: I'm changing my mind, and I believe Rick's TSO patch is fixing things (sorry). In looking at my notes, it's possible I had lagg on for those tests. lagg does seem to negate the TSO patch in my case. kernel.10stable_basicTSO_65535/ - IP_MAXPACKET = 65535; - manually forced (no if statem

Re: 9.2 ixgbe tx queue hang

2014-03-25 Thread Rick Macklem
Markus Gebert wrote: > > On 25.03.2014, at 22:46, Rick Macklem wrote: > > > Markus Gebert wrote: > >> > >> On 25.03.2014, at 02:18, Rick Macklem > >> wrote: > >> > >>> Christopher Forgeron wrote: > > > > This is regarding the TSO patch that Rick suggested earlier. > >>>

Re: 9.2 ixgbe tx queue hang

2014-03-25 Thread Markus Gebert
On 25.03.2014, at 22:46, Rick Macklem wrote: > Markus Gebert wrote: >> >> On 25.03.2014, at 02:18, Rick Macklem wrote: >> >>> Christopher Forgeron wrote: This is regarding the TSO patch that Rick suggested earlier. (With many thanks for his time and suggestion) >>>

Re: 9.2 ixgbe tx queue hang

2014-03-25 Thread Rick Macklem
Markus Gebert wrote: > > On 25.03.2014, at 02:18, Rick Macklem wrote: > > > Christopher Forgeron wrote: > >> > >> > >> > >> This is regarding the TSO patch that Rick suggested earlier. (With > >> many thanks for his time and suggestion) > >> > >> > >> As I mentioned earlier, it did not fix

Re: 9.2 ixgbe tx queue hang

2014-03-25 Thread Christopher Forgeron
I'm quite positive that an IP_MAXPACKET = 65518 would fix this, as I've never seen a packet overshoot by more than 11 bytes, although that's just in my case. It's next up on my test list. BTW, to answer the next message: I am expierencing the error with a raw ix or lagg interface. Originally I w

Re: 9.2 ixgbe tx queue hang

2014-03-25 Thread Christopher Forgeron
Hi guys, I'm in meetings today, so I'll respond to the other emails later. Just wanted to clarify about tp->t_tsomax : I can't make a solid assertion about it's value as I only tracked it briefly. I did see it being != if_hw_tsomax, but that was a short test and should really be checked more ca

Re: 9.2 ixgbe tx queue hang

2014-03-25 Thread Johan Kooijman
Hey guys, I have nothing on your code level to add, but.. while investigating this issue I ran into the guy that originally created the bug ( http://www.freebsd.org/cgi/query-pr.cgi?pr=183390&cat=). In the email exchange that followed he told me that had found a workaround by running a specific -S

Re: 9.2 ixgbe tx queue hang

2014-03-25 Thread Markus Gebert
On 25.03.2014, at 02:18, Rick Macklem wrote: > Christopher Forgeron wrote: >> >> >> >> This is regarding the TSO patch that Rick suggested earlier. (With >> many thanks for his time and suggestion) >> >> >> As I mentioned earlier, it did not fix the issue on a 10.0 system. It >> did make it

Re: 9.2 ixgbe tx queue hang

2014-03-24 Thread Rick Macklem
Julian Elischer wrote: - Original Message - > I wrote (and snipped): >> Other drivers (and ixgbe for the 82598 chip) can handle a packet that >> is in more than 32 mbufs. (I think the 82598 handles 100, grep for >> SCATTER >> in *.h in sys/dev/ixgbe.) >> > > the Xen backend can not handle m

Re: 9.2 ixgbe tx queue hang

2014-03-24 Thread Rick Macklem
Christopher Forgeron wrote: > > > > This is regarding the TSO patch that Rick suggested earlier. (With > many thanks for his time and suggestion) > > > As I mentioned earlier, it did not fix the issue on a 10.0 system. It > did make it less of a problem on 9.2, but either way, I think it's > n

Re: 9.2 ixgbe tx queue hang

2014-03-24 Thread Rick Macklem
Markus Gebert wrote: > > On 24.03.2014, at 16:21, Christopher Forgeron > wrote: > > > This is regarding the TSO patch that Rick suggested earlier. (With > > many > > thanks for his time and suggestion) > > > > As I mentioned earlier, it did not fix the issue on a 10.0 system. > > It did > > mak

Re: 9.2 ixgbe tx queue hang

2014-03-24 Thread Rick Macklem
Christopher Forgeron wrote: > I'm going to split this into different posts to focus on each topic. > This > is about setting IP_MAXPACKET to 65495 > > Update on Last Night's Run: > > (Last night's run is a kernel with IP_MAXPACKET = 65495) > > - Uptime on this run: 10:53AM up 13:21, 5 users, lo

Re: 9.2 ixgbe tx queue hang

2014-03-24 Thread Rick Macklem
Julian Elischer wrote: > On 3/23/14, 4:57 PM, Rick Macklem wrote: > > Christopher Forgeron wrote: > >> > >> > >> > >> > >> > >> On Sat, Mar 22, 2014 at 6:41 PM, Rick Macklem < > >> rmack...@uoguelph.ca > >>> wrote: > >> > >> > >> Christopher Forgeron wrote: > >>> #if defined(INET) || defined(INET6)

Re: 9.2 ixgbe tx queue hang

2014-03-24 Thread Christopher Forgeron
Well, a few more hours of running, and it's fairly easy to catch the packets with tcpdump, but not as easy to see if there is a pattern to them or what is different about them from the other packets that do pass with normal sizes. I'm using: tcpdump -ennvvvSuxx -i ix0 -s 64 greater 65495 here's

Re: 9.2 ixgbe tx queue hang

2014-03-24 Thread Julian Elischer
On 3/23/14, 4:57 PM, Rick Macklem wrote: Christopher Forgeron wrote: On Sat, Mar 22, 2014 at 6:41 PM, Rick Macklem < rmack...@uoguelph.ca wrote: Christopher Forgeron wrote: #if defined(INET) || defined(INET6) /* Initialize to max value. */ if (ifp->if_hw_tsomax == 0) ifp->if_hw_tsomax

Re: 9.2 ixgbe tx queue hang

2014-03-24 Thread Markus Gebert
On 24.03.2014, at 17:23, Christopher Forgeron wrote: > I think making hw_tsomax a sysctl would be a good patch to commit - It > could enable easy debugging/performance testing for the masses. > > I'm curious to hear how your environment is working with a tso turned off > on your nics. This wil

Re: 9.2 ixgbe tx queue hang

2014-03-24 Thread Christopher Forgeron
I think making hw_tsomax a sysctl would be a good patch to commit - It could enable easy debugging/performance testing for the masses. I'm curious to hear how your environment is working with a tso turned off on your nics. My testbed just hit the 2 hour mark. With TSO off, I don't get a single pa

Re: 9.2 ixgbe tx queue hang

2014-03-24 Thread Markus Gebert
On 24.03.2014, at 16:21, Christopher Forgeron wrote: > This is regarding the TSO patch that Rick suggested earlier. (With many > thanks for his time and suggestion) > > As I mentioned earlier, it did not fix the issue on a 10.0 system. It did > make it less of a problem on 9.2, but either way,

Re: 9.2 ixgbe tx queue hang

2014-03-24 Thread Christopher Forgeron
This is regarding the TSO patch that Rick suggested earlier. (With many thanks for his time and suggestion) As I mentioned earlier, it did not fix the issue on a 10.0 system. It did make it less of a problem on 9.2, but either way, I think it's not needed, and shouldn't be considered as a patch fo

Re: 9.2 ixgbe tx queue hang

2014-03-24 Thread Christopher Forgeron
I'm going to split this into different posts to focus on each topic. This is about setting IP_MAXPACKET to 65495 Update on Last Night's Run: (Last night's run is a kernel with IP_MAXPACKET = 65495) - Uptime on this run: 10:53AM up 13:21, 5 users, load averages: 1.98, 2.09, 2.13 - Ping logger re

Re: 9.2 ixgbe tx queue hang

2014-03-23 Thread Christopher Forgeron
Hi, I'll follow up more tomorrow, as it's late and I don't have time for detail. The basic TSO patch didn't work, as packets were were still going over 65535 by a fair amount. I thought I wrote that earlier, but I am dumping a lot of info into a few threads, so I apologize if I'm not as concise

Re: 9.2 ixgbe tx queue hang

2014-03-23 Thread Rick Macklem
Christopher Forgeron wrote: > > > > > > > > > Update: > > For giggles, I set IP_MAXPACKET = 32768. > Well, I'm pretty sure you don't want to do that, except for an experiment. You can just set if_hw_tsomax to whatever you want to try, at the place my ixgbe.patch put it (just before the ca

Re: 9.2 ixgbe tx queue hang

2014-03-23 Thread Rick Macklem
Christopher Forgeron wrote: > Hi Rick, very helpful as always. > > > On Sat, Mar 22, 2014 at 6:18 PM, Rick Macklem > wrote: > > > Christopher Forgeron wrote: > > > > Well, you could try making if_hw_tsomax somewhat smaller. (I can't > > see > > how the packet including ethernet header would be

Re: 9.2 ixgbe tx queue hang

2014-03-23 Thread Rick Macklem
Christopher Forgeron wrote: > > > > > > > On Sat, Mar 22, 2014 at 6:41 PM, Rick Macklem < rmack...@uoguelph.ca > > wrote: > > > > Christopher Forgeron wrote: > > #if defined(INET) || defined(INET6) > > /* Initialize to max value. */ > > if (ifp->if_hw_tsomax == 0) > > ifp->if_hw_tsomax = I

Re: 9.2 ixgbe tx queue hang

2014-03-23 Thread Christopher Forgeron
Update: For giggles, I set IP_MAXPACKET = 32768. Over a hour of runtime, and no issues. This is better than with the TSO patch and the 9.2 ixgbe, as that was just a drastic reduction in errors. Still have an 'angry' netstat -m on boot, and I'm still incrementing denied netbuf calls, so someth

Re: 9.2 ixgbe tx queue hang

2014-03-23 Thread Christopher Forgeron
On Sat, Mar 22, 2014 at 11:58 PM, Rick Macklem wrote: > Christopher Forgeron wrote: > > > > > Also should we not also subtract ETHER_VLAN_ENCAP_LEN from tsomax to > > make sure VLANs fit? > > > I took a look and, yes, this does seem to be needed. It will only be > needed for the case where a vlan

Re: 9.2 ixgbe tx queue hang

2014-03-23 Thread Christopher Forgeron
On Sat, Mar 22, 2014 at 6:41 PM, Rick Macklem wrote: > Christopher Forgeron wrote: > > #if defined(INET) || defined(INET6) > > /* Initialize to max value. */ > > if (ifp->if_hw_tsomax == 0) > > ifp->if_hw_tsomax = IP_MAXPACKET; > > KASSERT(ifp->if_hw_tsomax <= IP_MAXPACKET && > > ifp->if_hw_tsoma

Re: 9.2 ixgbe tx queue hang

2014-03-23 Thread Christopher Forgeron
Hi Rick, very helpful as always. On Sat, Mar 22, 2014 at 6:18 PM, Rick Macklem wrote: > Christopher Forgeron wrote: > > Well, you could try making if_hw_tsomax somewhat smaller. (I can't see > how the packet including ethernet header would be more than 64K with the > patch, but?? For example, t

Re: 9.2 ixgbe tx queue hang

2014-03-22 Thread Rick Macklem
Christopher Forgeron wrote: > > > > > > > Ah yes, I see it now: Line #658 > > #if defined(INET) || defined(INET6) > /* Initialize to max value. */ > if (ifp->if_hw_tsomax == 0) > ifp->if_hw_tsomax = IP_MAXPACKET; > KASSERT(ifp->if_hw_tsomax <= IP_MAXPACKET && > ifp->if_hw_tsomax >= IP_MAXPAC

Re: 9.2 ixgbe tx queue hang

2014-03-22 Thread Rick Macklem
Christopher Forgeron wrote: > > > > > > > Ah yes, I see it now: Line #658 > > #if defined(INET) || defined(INET6) > /* Initialize to max value. */ > if (ifp->if_hw_tsomax == 0) > ifp->if_hw_tsomax = IP_MAXPACKET; > KASSERT(ifp->if_hw_tsomax <= IP_MAXPACKET && > ifp->if_hw_tsomax >= IP_MAXPAC

Re: 9.2 ixgbe tx queue hang

2014-03-22 Thread Rick Macklem
Christopher Forgeron wrote: > Status Update: Hopeful, but not done. > > So the 9.2-STABLE ixgbe with Rick's TSO patch has been running all > night > while iometer hammered away at it. It's got over 8 hours of test time > on > it. > > It's still running, the CPU queues are not clogged, and everyth

Re: 9.2 ixgbe tx queue hang

2014-03-22 Thread Christopher Forgeron
Status Update: Hopeful, but not done. So the 9.2-STABLE ixgbe with Rick's TSO patch has been running all night while iometer hammered away at it. It's got over 8 hours of test time on it. It's still running, the CPU queues are not clogged, and everything is functional. However, my ping_logger.py

Re: 9.2 ixgbe tx queue hang

2014-03-21 Thread Christopher Forgeron
Ah yes, I see it now: Line #658 #if defined(INET) || defined(INET6) /* Initialize to max value. */ if (ifp->if_hw_tsomax == 0) ifp->if_hw_tsomax = IP_MAXPACKET; KASSERT(ifp->if_hw_tsomax <= IP_MAXPACKET && ifp->if_hw_tsomax >= IP_MAXPACKET / 8,

Re: 9.2 ixgbe tx queue hang

2014-03-21 Thread Christopher Forgeron
No errors for 1h 46m - That's a record. This is using the 9.2-STABLE ixgbe in a 10.0-RELEASE system, with Rick's suggested code below. I decided this must be it, so I aborted, and modified the ixgbe driver from 10.0-STABLE with Rick's suggestion. Installed and rebooted. Here's the extra values I p

Re: 9.2 ixgbe tx queue hang

2014-03-21 Thread Rick Macklem
Christopher Forgeron wrote: > It may be a little early, but I think that's it! > > It's been running without error for nearly an hour - It's very rare > it > would go this long under this much load. > > I'm going to let it run longer, then abort and install the kernel > with the > extra printfs s

Re: 9.2 ixgbe tx queue hang

2014-03-21 Thread Christopher Forgeron
It may be a little early, but I think that's it! It's been running without error for nearly an hour - It's very rare it would go this long under this much load. I'm going to let it run longer, then abort and install the kernel with the extra printfs so I can see what value ifp->if_hw_tsomax is be

Re: 9.2 ixgbe tx queue hang

2014-03-21 Thread Christopher Forgeron
Well I can tell you that your if statement in that patch is being activated. I added a printf in there, and it showed up on my dmsg. I still see bad things for netstat -m , but I'm starting a load run to see if it makes a difference. Next compile I'll add printouts of what ifp->if_hw_tsomax is on

Re: 9.2 ixgbe tx queue hang

2014-03-21 Thread Christopher Forgeron
Thanks Rick, trying it now. I'm currently working with the 9.2 ixgbe code as a starting point, as I'm curious/encouraged by the lack of jumbo cluster denials in netmap. I'll let you know how it works out. On Fri, Mar 21, 2014 at 8:44 PM, Rick Macklem wrote: > Christopher Forgeron wrote: > > >

Re: 9.2 ixgbe tx queue hang

2014-03-21 Thread Rick Macklem
Christopher Forgeron wrote: > > > > > > > Hello all, > > I ran Jack's ixgbe MJUM9BYTES removal patch, and let iometer hammer > away at the NFS store overnight - But the problem is still there. > > > From what I read, I think the MJUM9BYTES removal is probably good > cleanup (as long as it

Re: 9.2 ixgbe tx queue hang (packets that exceed 65535bytes in length)

2014-03-21 Thread Christopher Forgeron
Good point - I'm printing where Rick asked, in the 'before' printf statement, which comes before the m = m_defrag(*m_headp, M_NOWAIT); command in ixgbe_xmit I'm going to be adding more printf's to the code to see if I can find anything interesting, your suggestions would be welcome. ..and I suppo

Re: 9.2 ixgbe tx queue hang (packets that exceed 65535bytes in length)

2014-03-21 Thread Christopher Forgeron
Ah, I appreciate your efforts Rick - If you have any final parting hints, please let me know. I'm opening up access from my IDE so I can look at this with something a bit more advanced than the default vi. For the record, I was printing the flags out as an unsigned long, so that should be decimal.

Re: 9.2 ixgbe tx queue hang (packets that exceed 65535bytes in length)

2014-03-21 Thread shiu michael
>Ok, so this isn't a TSO segment then, unless I don't understand how >the csum flags are used, which is quite possible. >Assuming that you printed this out in decimal: >4116->0x1014 >Looking in mbuf.h, 0x1014 is >CSUM_SCTP_VALID | CSUM_FRAGMENT | CSUM_UDP > >alternately, if 4116 is hex, then it is:

Re: 9.2 ixgbe tx queue hang (packets that exceed 65535bytes in length)

2014-03-21 Thread Rick Macklem
Christopher Forgeron wrote: > (Pardon me, for some reason my gmail is sending on my cut-n-pastes if > I cr > down too fast) > > First set of logs: > > Mar 21 11:07:00 SAN0 kernel: before pklen=65542 actl=65542 csum=4116 Ok, so this isn't a TSO segment then, unless I don't understand how the csum

Re: 9.2 ixgbe tx queue hang

2014-03-21 Thread Christopher Forgeron
Update: I've noticed a fair number of differences in the ixgbe driver between 9.2 and 10.0-RELEASE, even though they have the same 2.5.15 version. Mostly Netmap integration. I've loaded up a 9.2-STABLE ixgbe driver from Dec 25th as it was handy (I had to hack the source a bit since some #def's ha

Re: 9.2 ixgbe tx queue hang

2014-03-21 Thread Christopher Forgeron
Fair enough. Have you tried disabling tso on the ix's ? That does fix the problem for me, however there is a performance penalty to be paid. I'm now regressing through the ixgbe drivers - I see there's been changes to how the queues are drained between 9.1 - 10.0, will see if the older ixgbe 2.4.

Re: 9.2 ixgbe tx queue hang

2014-03-21 Thread Markus Gebert
On 21.03.2014, at 15:49, Christopher Forgeron wrote: > However, if you can make a spare tester of the same hardware, that's > perfect - And you can generate all the load you need with benchmark > software like iometer, large NFS copies, or perhaps a small replica of your > network. Synthetic lo

Re: 9.2 ixgbe tx queue hang

2014-03-21 Thread Markus Gebert
On 21.03.2014, at 16:22, Christopher Forgeron wrote: > Markus, > > I don't know why I didn't notice this before.. I copied your cpuset ping > verbatim, not realizing that I should be using 172.16.0.x as that's my > network on the ix's > > On this tester box, 10.0.0.1 goes out a different inter

Re: 9.2 ixgbe tx queue hang

2014-03-21 Thread Christopher Forgeron
Markus, I don't know why I didn't notice this before.. I copied your cpuset ping verbatim, not realizing that I should be using 172.16.0.x as that's my network on the ix's On this tester box, 10.0.0.1 goes out a different interface, thus it never reported back any problems. Now that I've corr

Re: 9.2 ixgbe tx queue hang

2014-03-21 Thread Christopher Forgeron
(Pardon me, for some reason my gmail is sending on my cut-n-pastes if I cr down too fast) First set of logs: Mar 21 11:07:00 SAN0 kernel: before pklen=65542 actl=65542 csum=4116 Mar 21 11:07:00 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65542 Mar 21 11:07:00 SAN0 kernel: before pklen=65542 actl

Re: 9.2 ixgbe tx queue hang

2014-03-21 Thread Christopher Forgeron
Rick: Unfortunately your patch didn't work. I expected as much as soon as I saw my boot time 'netstat -m', but I wanted to run the tests to make sure. First, here is where I put in your additional line - Let me know if that's what you were hoping for, as I'm using mmm->m_pkthdr.csum_flags, as m

Re: 9.2 ixgbe tx queue hang

2014-03-21 Thread Christopher Forgeron
Ah, I understand the difficulties of testing production systems. However, if you can make a spare tester of the same hardware, that's perfect - And you can generate all the load you need with benchmark software like iometer, large NFS copies, or perhaps a small replica of your network. Synthetic

Re: 9.2 ixgbe tx queue hang

2014-03-21 Thread Markus Gebert
On 21.03.2014, at 14:16, Christopher Forgeron wrote: > Hi Markus, > > Yes, we may have different problems, or perhaps the same problem is > manifesting itself in different ways in our systems. > > Have you tried a 10.0-RELEASE system yet? If we were on the same OS version, > we could then

Re: 9.2 ixgbe tx queue hang

2014-03-21 Thread Christopher Forgeron
Hi Markus, Yes, we may have different problems, or perhaps the same problem is manifesting itself in different ways in our systems. Have you tried a 10.0-RELEASE system yet? If we were on the same OS version, we could then compare system specs a bit deeper, and see what is different. Perhaps un

Re: 9.2 ixgbe tx queue hang

2014-03-21 Thread Markus Gebert
On 21.03.2014, at 12:47, Christopher Forgeron wrote: > Hello all, > > I ran Jack's ixgbe MJUM9BYTES removal patch, and let iometer hammer away > at the NFS store overnight - But the problem is still there. > > From what I read, I think the MJUM9BYTES removal is probably good cleanup > (as lon

Re: 9.2 ixgbe tx queue hang

2014-03-21 Thread Christopher Forgeron
Hello all, I ran Jack's ixgbe MJUM9BYTES removal patch, and let iometer hammer away at the NFS store overnight - But the problem is still there. >From what I read, I think the MJUM9BYTES removal is probably good cleanup (as long as it doesn't trade performance on a lightly memory loaded system

Re: 9.2 ixgbe tx queue hang

2014-03-20 Thread Christopher Forgeron
Pardon.. delay in recv'ing messages. I see your edits for 777-778 .. will attempt tomorrow. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.or

Re: 9.2 ixgbe tx queue hang

2014-03-20 Thread Christopher Forgeron
Sorry Rick, what's the small change you wanted in sys/netinet/tcp_output.c at 777-778? I see it's calc'ing length... or did you want me to take the whole file back to 9.1-RELEASE ? On Thu, Mar 20, 2014 at 11:47 PM, Rick Macklem wrote: > Christopher Forgeron wrote: > > Yes, there is something br

Re: 9.2 ixgbe tx queue hang

2014-03-20 Thread Christopher Forgeron
BTW - I think this will end up being a TSO issue, not the patch that Jack applied. When I boot Jack's patch (MJUM9BYTES removal) this is what netstat -m shows: 21489/2886/24375 mbufs in use (current/cache/total) 4080/626/4706/6127254 mbuf clusters in use (current/cache/total/max) 4080/587 mbuf+cl

Re: 9.2 ixgbe tx queue hang

2014-03-20 Thread Rick Macklem
Christopher Forgeron wrote: > Yes, there is something broken in TSO for sure, as disabling it > allows me > to run without error. It is possible that the drop in performance is > allowing me to stay under a critical threshold for the problem, but > I'd > feel happier testing to make sure. > > I un

Re: 9.2 ixgbe tx queue hang

2014-03-20 Thread Christopher Forgeron
Yes, there is something broken in TSO for sure, as disabling it allows me to run without error. It is possible that the drop in performance is allowing me to stay under a critical threshold for the problem, but I'd feel happier testing to make sure. I understand what you're asking for in the patch

Re: 9.2 ixgbe tx queue hang

2014-03-20 Thread Rick Macklem
Christopher Forgeron wrote: > > > > > > > On Thu, Mar 20, 2014 at 7:40 AM, Markus Gebert < > markus.geb...@hostpoint.ch > wrote: > > > > > > Possible. We still see this on nfsclients only, but I’m not convinced > that nfs is the only trigger. > > Since Christopher is getting a bunch of

Re: 9.2 ixgbe tx queue hang

2014-03-20 Thread Rick Macklem
Christopher Forgeron wrote: > > Output from the patch you gave me (I have screens of it.. let me know > what you're hoping to see. > > > Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538 > Mar 20 16:37:22 SAN0 kernel: before pklen=65538 actl=65538 Hmm. I think this means that th

Re: 9.2 ixgbe tx queue hang

2014-03-20 Thread Christopher Forgeron
Ah, good point about the 4k buff size : I will allocate more to kern.ipc.nmbjumbop , perhaps taking it from 9 and 16. Yes, I did have to tweak the patch slightly to work on 10.0, but it's basically the same thing I was trying after looking at Garrett's notes. I see this is part of a larger proble

Re: 9.2 ixgbe tx queue hang

2014-03-20 Thread Jack Vogel
Your 4K mbuf pool is not being used, make sure you increase the size once you are using that or you'll just be having the same issue with a different pool. Oh, and that patch was against the code in HEAD, it might need some manual hacking if you're using anything older. Not sure what you mean abo

Re: 9.2 ixgbe tx queue hang

2014-03-20 Thread Christopher Forgeron
I agree, performance is noticeably worse with TSO off, but I thought it would be a good step in troubleshooting. I'm glad you're a regular reader of the list, so I don't have to settle for slow performance. :-) I'm applying your patch now, I think it will fix it - but I'll report in after it's run

Re: 9.2 ixgbe tx queue hang

2014-03-20 Thread Jack Vogel
I strongly discourage anyone from disabling TSO on 10G, its necessary to get the performance one wants to see on the hardware. Here is a patch to do what i'm talking about: *** ixgbe.cFri Jan 10 18:12:20 2014 --- ixgbe.jfv.cThu Mar 20 23:04:15 2014 *** ixgbe_init_locked(struct

Re: 9.2 ixgbe tx queue hang

2014-03-20 Thread Christopher Forgeron
Hi Jack, I'm on ixgbe 2.5.15 I see a few other threads about using MJUMPAGESIZE instead of MJUM9BYTES. If you have a patch you'd like me to test, I'll compile it in and let you know. I was just looking at Garrett's if_em.c patch and thinking about applying it to ixgbe.. As it stands I seem

Re: 9.2 ixgbe tx queue hang

2014-03-20 Thread Jack Vogel
What he's saying is that the driver should not be using 9K mbuf clusters, I thought this had been changed but I see the code in HEAD is still using the larger clusters when you up the mtu. I will put it on my list to change with the next update to HEAD. What version of ixgbe are you using? Jack

Re: 9.2 ixgbe tx queue hang

2014-03-20 Thread Christopher Forgeron
Any recommendations on what to do? I'm experimenting with disabling TSO right now, but it's too early to tell if it fixes my problem. On my 9.2 box, we don't see this number climbing. With TSO off on 10.0, I also see the number is not climbing. I'd appreciate any links you may have so I can read

Re: 9.2 ixgbe tx queue hang

2014-03-20 Thread Christopher Forgeron
I have found this: http://lists.freebsd.org/pipermail/freebsd-net/2013-October/036955.html I think what you're saying is that; - a MTU of 9000 doesn't need to equal a 9k mbuf / jumbo cluster - modern NIC drivers can gather 9000 bytes of data from various memory locations - The fact that I'm seein

Re: 9.2 ixgbe tx queue hang

2014-03-20 Thread Garrett Wollman
In article , csforge...@gmail.com writes: >50/27433/0 requests for jumbo clusters denied (4k/9k/16k) This is going to screw you. You need to make sure that no NIC driver ever allocates 9k jumbo pages -- unless you are using one of those mythical drivers that can't do scatter/gather DMA on receiv

Re: 9.2 ixgbe tx queue hang

2014-03-20 Thread Christopher Forgeron
Re: cpuset ping I can report that I do not get any fails with this ping - I have screens of failed flood pings on the ix0 nic, but these always pass (i have that cpuset ping looping constantly). I can't report about the dtrace yet, as I'm running Rick's ixgbe patch, and there seems to be a .ko co

Re: 9.2 ixgbe tx queue hang

2014-03-20 Thread Christopher Forgeron
BTW, When I have the problem, this is what I see from netstat -m 4080/2956/7036/6127254 mbuf clusters in use (current/cache/total/max) 4080/2636 mbuf+clusters out of packet secondary zone in use (current/cache) 0/50/50/3063627 4k (page size) jumbo clusters in use (current/cache/total/max) 32768/

Re: 9.2 ixgbe tx queue hang

2014-03-20 Thread Christopher Forgeron
Output from the patch you gave me (I have screens of it.. let me know what you're hoping to see. Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538 Mar 20 16:37:22 SAN0 kernel: before pklen=65538 actl=65538 Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538 Mar 20

Re: 9.2 ixgbe tx queue hang

2014-03-20 Thread Markus Gebert
On 20.03.2014, at 16:50, Christopher Forgeron wrote: > Markus, > > I just wanted to clarify what dtrace will output in a 'no-error' > situation. I'm seeing the following during a normal ping (no errors) on > ix0, or even on a non-problematic bge NIC: > > This is expected. This dtrace probe w

Re: 9.2 ixgbe tx queue hang

2014-03-20 Thread Christopher Forgeron
(Struggling with this mail client for some reason, sorry, here's the paste) # dtrace -n 'fbt:::return / arg1 == EFBIG && execname == "ping" / { stack(); }' dtrace: description 'fbt:::return ' matched 24892 probes CPU IDFUNCTION:NAME 19 29656 maybe_yield:retu

Re: 9.2 ixgbe tx queue hang

2014-03-20 Thread Christopher Forgeron
Markus, I just wanted to clarify what dtrace will output in a 'no-error' situation. I'm seeing the following during a normal ping (no errors) on ix0, or even on a non-problematic bge NIC: On Thu, Mar 20, 2014 at 7:40 AM, Markus Gebert wrote: > Also, if you have dtrace available: > > kldload

Re: 9.2 ixgbe tx queue hang

2014-03-20 Thread Christopher Forgeron
On Thu, Mar 20, 2014 at 7:40 AM, Markus Gebert wrote: > > > Possible. We still see this on nfsclients only, but I'm not convinced that > nfs is the only trigger. > > Just to clarify, I'm experiencing this error with NFS, but also with iSCSI - I turned off my NFS server in rc.conf and rebooted, and

Re: 9.2 ixgbe tx queue hang

2014-03-20 Thread Markus Gebert
On 19.03.2014, at 20:17, Christopher Forgeron wrote: > Hello, > > > > I can report this problem as well on 10.0-RELEASE. > > > > I think it's the same as kern/183390? Possible. We still see this on nfsclients only, but I’m not convinced that nfs is the only trigger. > I have two physic

Re: 9.2 ixgbe tx queue hang

2014-03-19 Thread Rick Macklem
Christopher Forgeron wrote: > Hello, > > > > I can report this problem as well on 10.0-RELEASE. > > > > I think it's the same as kern/183390? > > > > I have two physically identical machines, one running 9.2-STABLE, and > one > on 10.0-RELEASE. > > > > My 10.0 machine used to be running

9.2 ixgbe tx queue hang

2014-03-19 Thread Christopher Forgeron
(Sorry for the formatting on that last message, that was weird) Today I wanted to test the assertion that this is a NFS issue, since we all seem to be running NFS. I shut down my NFS daemon in rc.conf, configured the FreeBSD10 iSCSI ctld, rebooted, and then ran all my tests exclusively from the

Re: 9.2 ixgbe tx queue hang

2014-03-19 Thread Christopher Forgeron
Hello, I can report this problem as well on 10.0-RELEASE. I think it's the same as kern/183390? I have two physically identical machines, one running 9.2-STABLE, and one on 10.0-RELEASE. My 10.0 machine used to be running 9.0-STABLE for over a year without any problems. I'm not havin

Re: 9.2 ixgbe tx queue hang (was: Network loss)

2014-03-07 Thread Markus Gebert
On 06.03.2014, at 22:38, Jack Vogel wrote: > I suppose to be sure where the issue really occurs it would be best if both > pseudo drivers > were out of the picture. Once we see if it still occurs we can take the > next step. > > Regards, > > Jack I removed lagg and vlan interfaces on four se

Re: 9.2 ixgbe tx queue hang

2014-03-06 Thread Dr. A. Haakh
Markus Gebert schrieb: On 06.03.2014, at 19:33, Jack Vogel wrote: You did not make it explicit before, but I noticed in your dtrace info that you are using lagg, its been the source of lots of problems, so take it out of the setup and see if this queue problem still happens please. Jack Wel

  1   2   >