Re: ixgbe(4) spin lock held too long

2015-01-17 Thread Jason Wolfe
On Mon, Dec 15, 2014 at 9:23 AM, John Baldwin wrote: > On Wednesday, December 10, 2014 12:47:02 PM Jason Wolfe wrote: >> John, >> >> So apparently the concurrent timer scheduling was not fixed, though it >> does seem rarer. We had about 2 weeks of stability, then last night >> we had a crash on a

Re: ixgbe(4) spin lock held too long

2014-10-24 Thread John Baldwin
On Thursday, October 23, 2014 02:12:44 PM Jason Wolfe wrote: > On Sat, Oct 18, 2014 at 4:42 AM, John Baldwin wrote: > > On Friday, October 17, 2014 11:32:13 PM Jason Wolfe wrote: > >> Producing 10G of random traffic against a server with this assertion > >> added took about 2 hours to panic, so if

Re: ixgbe(4) spin lock held too long

2014-10-23 Thread Jason Wolfe
On Sat, Oct 18, 2014 at 4:42 AM, John Baldwin wrote: > On Friday, October 17, 2014 11:32:13 PM Jason Wolfe wrote: >> Producing 10G of random traffic against a server with this assertion >> added took about 2 hours to panic, so if it turns out we need anything >> further it should be pretty quick.

Re: ixgbe(4) spin lock held too long

2014-10-18 Thread John Baldwin
On Friday, October 17, 2014 11:43:26 PM Adrian Chadd wrote: > Hm, is this the bug that was just fixed in -HEAD? > > I saw this similar bug on -HEAD with lots of quick connections and > reused ports. It ended up deferencing a NULL tcp timer pointer from > the inpcb. Is that what the code in your tr

Re: ixgbe(4) spin lock held too long

2014-10-18 Thread John Baldwin
On Friday, October 17, 2014 11:32:13 PM Jason Wolfe wrote: > Producing 10G of random traffic against a server with this assertion > added took about 2 hours to panic, so if it turns out we need anything > further it should be pretty quick. > > #4 list > 2816 * timer and remembe

Re: ixgbe(4) spin lock held too long

2014-10-17 Thread Adrian Chadd
Hm, is this the bug that was just fixed in -HEAD? I saw this similar bug on -HEAD with lots of quick connections and reused ports. It ended up deferencing a NULL tcp timer pointer from the inpcb. Is that what the code in your tree is doing? -a On 17 October 2014 23:32, Jason Wolfe wrote: > On

Re: ixgbe(4) spin lock held too long

2014-10-17 Thread Jason Wolfe
On Thu, Oct 16, 2014 at 12:23 PM, John Baldwin wrote: > > > I looked at the other trace and I don't think it disagrees with my previous > theory. I do have more KTR patches to log when we spin on locks which would > really confirm this, but I haven't tested those fully on HEAD yet. > > However, I

Re: ixgbe(4) spin lock held too long

2014-10-17 Thread Jason Wolfe
On Thu, Oct 16, 2014 at 12:23 PM, John Baldwin wrote: > > > I looked at the other trace and I don't think it disagrees with my previous > theory. I do have more KTR patches to log when we spin on locks which would > really confirm this, but I haven't tested those fully on HEAD yet. > > However, I

Re: ixgbe(4) spin lock held too long

2014-10-16 Thread John Baldwin
On Saturday, October 11, 2014 2:19:19 am Jason Wolfe wrote: > On Fri, Oct 10, 2014 at 8:53 AM, John Baldwin wrote: > > > On Thursday, October 09, 2014 02:31:32 PM Jason Wolfe wrote: > > > On Wed, Oct 8, 2014 at 12:29 PM, John Baldwin wrote: > > > > My only other thought is if a direct timeout ro

Re: ixgbe(4) spin lock held too long

2014-10-13 Thread Jason Wolfe
On Fri, Oct 10, 2014 at 11:19 PM, Jason Wolfe wrote: > On Fri, Oct 10, 2014 at 8:53 AM, John Baldwin wrote: > >> On Thursday, October 09, 2014 02:31:32 PM Jason Wolfe wrote: >> > On Wed, Oct 8, 2014 at 12:29 PM, John Baldwin wrote: >> > > My only other thought is if a direct timeout routine ran

Re: ixgbe(4) spin lock held too long

2014-10-10 Thread Jason Wolfe
On Fri, Oct 10, 2014 at 8:53 AM, John Baldwin wrote: > On Thursday, October 09, 2014 02:31:32 PM Jason Wolfe wrote: > > On Wed, Oct 8, 2014 at 12:29 PM, John Baldwin wrote: > > > My only other thought is if a direct timeout routine ran for a long > time. > > > > > > I just committed a change to

Re: ixgbe(4) spin lock held too long

2014-10-10 Thread John Baldwin
On Thursday, October 09, 2014 02:31:32 PM Jason Wolfe wrote: > On Wed, Oct 8, 2014 at 12:29 PM, John Baldwin wrote: > > My only other thought is if a direct timeout routine ran for a long time. > > > > I just committed a change to current that can let you capture KTR traces > > of > > callout rou

Re: ixgbe(4) spin lock held too long

2014-10-09 Thread Jason Wolfe
On Wed, Oct 8, 2014 at 12:29 PM, John Baldwin wrote: > My only other thought is if a direct timeout routine ran for a long time. > > I just committed a change to current that can let you capture KTR traces of > callout routines for use with schedgraph (r272757). Unfortunately, > enabling KTR_SCH

Re: ixgbe(4) spin lock held too long

2014-10-08 Thread John Baldwin
On Wednesday, October 08, 2014 10:56:56 AM Jason Wolfe wrote: > On Tue, Oct 7, 2014 at 11:28 AM, John Baldwin wrote: > > On Tuesday, October 07, 2014 2:06:42 pm Jason Wolfe wrote: > > > Hey John, > > > > > > Happy to do this, but the pool of boxes is about 500 large, which is the > > > reason I'm

Re: ixgbe(4) spin lock held too long

2014-10-08 Thread Jason Wolfe
On Tue, Oct 7, 2014 at 11:28 AM, John Baldwin wrote: > On Tuesday, October 07, 2014 2:06:42 pm Jason Wolfe wrote: > > Hey John, > > > > Happy to do this, but the pool of boxes is about 500 large, which is the > > reason I'm able to see a crash every day or so. I've pulled a portion of > > them o

Re: ixgbe(4) spin lock held too long

2014-10-03 Thread John Baldwin
On Thursday, October 02, 2014 06:40:21 PM Jason Wolfe wrote: > On Wed, Sep 10, 2014 at 8:24 AM, John Baldwin wrote: > > On Monday, September 08, 2014 03:34:02 PM Eric van Gyzen wrote: > > > On 09/08/2014 15:19, Sean Bruno wrote: > > > > On Mon, 2014-09-08 at 12:09 -0700, Sean Bruno wrote: > > > >>

Re: ixgbe(4) spin lock held too long

2014-10-02 Thread Jason Wolfe
On Wed, Sep 10, 2014 at 8:24 AM, John Baldwin wrote: > On Monday, September 08, 2014 03:34:02 PM Eric van Gyzen wrote: > > On 09/08/2014 15:19, Sean Bruno wrote: > > > On Mon, 2014-09-08 at 12:09 -0700, Sean Bruno wrote: > > >> This sort of looks like the hardware failed to respond to us in time?

Re: ixgbe(4) spin lock held too long

2014-09-10 Thread John Baldwin
On Monday, September 08, 2014 03:34:02 PM Eric van Gyzen wrote: > On 09/08/2014 15:19, Sean Bruno wrote: > > On Mon, 2014-09-08 at 12:09 -0700, Sean Bruno wrote: > >> This sort of looks like the hardware failed to respond to us in time? > >> Too busy? > >> > >> sean > > > > This seems to be affec

Re: ixgbe(4) spin lock held too long

2014-09-08 Thread Sean Bruno
On Mon, 2014-09-08 at 15:34 -0400, Eric van Gyzen wrote: > >> Unread portion of the kernel message buffer: > >> spin lock 0x812a0400 (callout) held by 0xf800151fe000 > (tid > >> 13) too long > > TID 13 is usually a kernel idle thread, which would seem to > indicate > a dangling

Re: ixgbe(4) spin lock held too long

2014-09-08 Thread Eric van Gyzen
On 09/08/2014 15:19, Sean Bruno wrote: > On Mon, 2014-09-08 at 12:09 -0700, Sean Bruno wrote: >> This sort of looks like the hardware failed to respond to us in time? >> Too busy? >> >> sean >> > This seems to be affecting my 10/stable machines from 15Aug2014. > > Not a lot of churn in the code s

Re: ixgbe(4) spin lock held too long

2014-09-08 Thread Sean Bruno
On Mon, 2014-09-08 at 12:09 -0700, Sean Bruno wrote: > This sort of looks like the hardware failed to respond to us in time? > Too busy? > > sean > This seems to be affecting my 10/stable machines from 15Aug2014. Not a lot of churn in the code so I don't think this is new. The afflicted machi

ixgbe(4) spin lock held too long

2014-09-08 Thread Sean Bruno
This sort of looks like the hardware failed to respond to us in time? Too busy? sean panic: spin lock held too long GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or dist