Re: if_sge related panics

Nikolay Denev Wed, 02 Jun 2010 23:29:53 -0700

On May 24, 2010, at 8:12 PM, Pyun YongHyeon wrote:

> On Mon, May 24, 2010 at 09:48:33AM -0400, John Baldwin wrote:
>> On Monday 24 May 2010 6:35:01 am Nikolay Denev wrote:
>>> On May 24, 2010, at 8:57 AM, Nikolay Denev wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Recently I started to experience a if_sge(4) related panic.
>>>> It happens almost every time I try to download a torrent file for example.
>>>> Copying of large files over NFS seem not to trigger it, but I haven't 
>>>> tested extensively.
>>>> 
>>>> Here is the panic message :
>>>> 
>>>> Fatal trap 12: page fault while in kernel mode
>>>> cpuid = 0; apic id = 00
>>>> fault virtual address              = 0x8
>>>> fault code                         = supervisor write data, page not 
>>>> present
>>>> instruction pointer                = 0x20:0xffffffff80230413
>>>> stack pointer                              = 0x28:0xffffff80001e9280
>>>> frame pointer                      = 0x28:0xffffff80001e9510
>>>> code segment                       = base 0x0, limit 0xfffff, type 0x1b
>>>>                                            = DPL 0, pres 1, long 1, def32 
>>>> 0, gran 1
>>>> processor eflags                   = interrupt enabled, resume, IOPL = 0
>>>> current process                    = 12 (irq19: sge0)
>>>> trap number                                = 12
>>>> panic: page fault
>>>> cpuid = 0
>>>> Uptime: 1d20h56m20s
>>>> Cannot dump. Device not defined or unavailable
>>>> Automatic reboot in 15 seconds - press a key on the console to abort
>>>> Sleeping thread (tid 100039, pid 12) owns a non-sleepable lock
>>>> 
>>>> My swap is on a zvol, so I don't have dump. I'll try to attach a disk on 
>>>> the eSATA port and dump there if needed.
>>> 
>>> Here is some info from the crashdump :
>>> 
>>> (kgdb) #0  doadump () at pcpu.h:223
>>> #1  0xffffffff802fb149 in boot (howto=260)
>>>    at /usr/src/sys/kern/kern_shutdown.c:416
>>> #2  0xffffffff802fb57c in panic (fmt=0xffffffff8055d564 "%s")
>>>    at /usr/src/sys/kern/kern_shutdown.c:590
>>> #3  0xffffffff805055b8 in trap_fatal (frame=0xffffff000288a3e0, 
>>> eva=Variable "eva" is not available.
>>> )
>>>    at /usr/src/sys/amd64/amd64/trap.c:777
>>> #4  0xffffffff805059dc in trap_pfault (frame=0xffffff80001e91d0, usermode=0)
>>>    at /usr/src/sys/amd64/amd64/trap.c:693
>>> #5  0xffffffff805061c5 in trap (frame=0xffffff80001e91d0)
>>>    at /usr/src/sys/amd64/amd64/trap.c:451
>>> #6  0xffffffff804eb977 in calltrap ()
>>>    at /usr/src/sys/amd64/amd64/exception.S:223
>>> #7  0xffffffff80230413 in sge_start_locked (ifp=0xffffff000270d800)
>>>    at /usr/src/sys/dev/sge/if_sge.c:1591
>> 
>> Try this.  sge_encap() can sometimes return an error with m_head set to NULL:
>> 
> 
> Thanks John. Committed in r208512.
> 
>> Index: if_sge.c
>> ===================================================================
>> --- if_sge.c (revision 208375)
>> +++ if_sge.c (working copy)
>> @@ -1588,7 +1588,8 @@
>>              if (m_head == NULL)
>>                      break;
>>              if (sge_encap(sc, &m_head)) {
>> -                    IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
>> +                    if (m_head != NULL)
>> +                            IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
>>                      ifp->if_drv_flags |= IFF_DRV_OACTIVE;
>>                      break;
>>              }
>> 
>> -- 
>> John Baldwin


After the patch I experienced several network outages (ping reporting "no 
buffer space available")
that were resolved by ifconfig down/up of the sge(4) interface.

I can see that most of the other drivers that handle XXX_encap() returning 
m_head pointing NULL, break when this condition
is hit: i.e. :

Index: if_sge.c
===================================================================
--- if_sge.c    (revision 208375)
+++ if_sge.c    (working copy)
@@ -1588,7 +1588,8 @@
                if (m_head == NULL)
                        break;
                if (sge_encap(sc, &m_head)) {
-                       IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
+                       if (m_head == NULL)
+                               break;
                        IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
                        ifp->if_drv_flags |= IFF_DRV_OACTIVE;
                        break;
                }

But here in sge(4) we always set IFF_DRV_OACTIVE.
Do you think this can be the source of the problem ?

Regards,
Niki_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: if_sge related panics

Reply via email to