Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled

2011-10-27 Thread Emil Muratov



Hi All,
I've got almost the same problem with intel 82574L based nic. My platform is
nvidia ion running Atom 1.6 and nic is an external PCI-express adapter.
Unlike Jason's case mine is always stuck in receiving traffic, it's Ierrs
increasing while Ipkts not. Thanks to Jason's script I can see those locks
and interface flapping every several hours. My system is not a heavy loaded
server but just a home nas/router, usually routing at 100 mbps or less.
Nither disabling MSIX nor tuning txd rxd doesn't help me.


Hi,

When this occurs, does this completely lock up with RX traffic? Ie,
_no_ valid RX traffic occurs?
If so, does the error count increase 1:1 with what traffic you're
trying to send to it?


Hi Adrian
Yes, it's completely locked - ping packet loss to the interface is 100%, 
I also managed to run tcpdump -ni em0 and there were only outgoing 
packets from the interface but no packets arrived to the interface. As 
for the error counter ratio - I'm not sure about it, don't know how 
check this besides comparing counters on the switch with interface, but 
it's a little bit complicated to catch in time.




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled

2011-10-27 Thread Hooman Fazaeli

On 10/27/2011 9:59 AM, Emil Muratov wrote:


Hi Hooman

Here is what I've got when the script triggered just in time when the interface 
was locked


11.10.26-23:39:10 ... interface em0 is down...

FreeBSD ion.hotplug.ru 8.2-STABLE FreeBSD 8.2-STABLE #0: Thu Oct 20 20:20:25 
MSD 2011 r...@epia.home
.lan:/usr/obj/usr/src/sys/ION6debug  amd64
11:39PM  up  1:12, 2 users, load averages: 0.26, 0.48, 0.58


 == vmstat -i ==
interrupt  total   rate
irq22: nfe0 16644480   3865
cpu0: timer  8610122   1999
irq256: ahci0 606705140
irq257: em0:rx 0 3896622904
irq258: em0:tx 0 2762957641
irq259: em0:link 620  0
cpu3: timer  8609499   1999
cpu1: timer  8609499   1999
cpu2: timer  8609499   1999
Total   58350003  13550

 == netstat -ind ==
NameMtu Network   Address  Ipkts Ierrs IdropOpkts Oerrs 
 Coll Drop
usbus 00 0 00 0 
00
usbus 00 0 00 0 
00
nfe0   1500   00:25:22:21:86:89  7157140 0 0 12266747 0 
00
nfe0   1500 fe80::225:22f fe80::225:22ff:fe0 - -   85 - 
--
nfe0   1500 10.16.128.0/1 10.16.189.71 0 - -48135 - 
--
em09000   00:1b:21:ab:bf:4a  5465087   623 0  2862028 0 
0  113
em09000 192.168.168.0 192.168.168.1   764085 - -  1005078 - 
--
em09000 fe80::21b:21f fe80::21b:21ff:fe   45 - -  252 - 
--
em09000 2002:d58d:871 2002:d58d:8715:1:   73 - -   38 - 
--
wifi   1500   00:1b:21:ab:bf:4a  347 0 0  350 0 
00
wifi   1500 192.168.168.6 192.168.168.65   0 - -0 - 
--
wifi   1500 fe80::225:x fe80::225:x:x0 - -  349 - - 
   -
wifi   1500 2002:x:x 2002:x:x:2:0 - -0 - --
wifio  1500   00:1b:21:ab:bf:4a59559 0 0   114639 0 
00
wifio  1500 192.168.168.8 192.168.168.81   0 - -  160 - 
--
wifio  1500 fe80::225:x fe80::225:x:x0 - -0 - - 
   -
stf0   1280 5725 0 0 6125   420 
00
stf0   1280 2002:x:x 2002:x:x::1 1878 - - 1121 - --
ng0*   1500   0 0 00 0 
00
ng1*   1500   0 0 00 0 
00
ng21492 7143733 0 0 12234436 0 
00
ng21492 213.141.x.x 213.141.x.x 4735932 - -  8480089 - 
--
ng21492 fe80::x:x fe80::x:x:x0 - -1 - --
tun0   1455 350 0 0  172 0 
00
tun0   1455 fe80::225:x fe80::225:x:x0 - -2 - - 
   -
tun0   1455 192.168.169.1 192.168.169.1  117 - -  167 - 
--

Oct 26 23:39:11 ion kernel: em0: hw tdh = 975, hw tdt = 944
Oct 26 23:39:11 ion kernel: em0: hw rdh = 960, hw rdt = 959
Oct 26 23:39:11 ion kernel: em0: Tx Queue Status = 1
Oct 26 23:39:11 ion kernel: em0: TX descriptors avail = 31
Oct 26 23:39:11 ion kernel: em0: Tx Descriptors avail failure = 0
Oct 26 23:39:11 ion kernel: em0: RX discarded packets = 0
Oct 26 23:39:11 ion kernel: em0: RX Next to Check = 960
Oct 26 23:39:11 ion kernel: em0: RX Next to Refresh = 959

net.inet.ip.intr_queue_maxlen: 4096
net.inet.ip.intr_queue_drops: 0
dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.2.3
dev.em.0.%driver: em
dev.em.0.%location: slot=0 function=0
dev.em.0.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x8086 
subdevice=0xa01f class=0x02
dev.em.0.%parent: pci2
dev.em.0.nvm: -1
dev.em.0.debug: -1
dev.em.0.rx_int_delay: 200
dev.em.0.tx_int_delay: 200
dev.em.0.rx_abs_int_delay: 4096
dev.em.0.tx_abs_int_delay: 4096
dev.em.0.rx_processing_limit: 100
dev.em.0.flow_control: 3
dev.em.0.eee_control: 0
dev.em.0.link_irq: 648
dev.em.0.mbuf_alloc_fail: 0
dev.em.0.cluster_alloc_fail: 0
dev.em.0.dropped: 0
dev.em.0.tx_dma_fail: 0
dev.em.0.rx_overruns: 0
dev.em.0.watchdog_timeouts: 0
dev.em.0.device_control: 1477444168
dev.em.0.rx_control: 100827170
dev.em.0.fc_high_water: 11264
dev.em.0.fc_low_water: 9764
dev.em.0.queue0.txd_head: 975
dev.em.0.queue0.txd_tail: 944
dev.em.0.queue0.tx_irq: 2762762
dev.em.0.queue0.no_desc_avail: 0
dev.em.0.queue0.rxd_head: 960
dev.em.0.queue0.rxd_tail: 959
dev.em.0.queue0.rx_irq: 3895860
dev.em.0.mac_stats.excess_coll: 0
dev.em.0.mac_stats.single_coll: 0
dev.em.0.mac_stats.multiple_coll: 0
dev.

Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled

2011-10-27 Thread Lev Serebryakov
Hello, Mike.
You wrote 7 октября 2011 г., 19:06:34:

> This sure sounds like the issue I was seeing with the 7.1.9 driver...
> However, it has been fixed for me by going to 7.2.3, which is in
> RELENG_8.  Is it possible you have a couple of issues going on since you
> are using lagg as well ?  Another problem some folks have reported is
> that in the BIOS, if you have an option for ASPM, make sure its disabled.
  I had a lot of such problems with 7.1.9 on my 82566DM, and I
thought, that new driver is Ok, but yesterday it happens again with
7.2.3.

  No packets could be sent, buffers are overfilled, only full reset
helps (after "ifconfig wm0 down && ifconfig em0 up" ping starts to
report "Host is down" for any remote host, instead of "No buffer space
available")...

  8-STABLE, 7.2.3 driver, amd64, 82566DM LOM chip.

-- 
// Black Lion AKA Lev Serebryakov 

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIXenabled

2011-10-27 Thread Steven Hartland

What did netstat -m show?

   Regards
   Steve

- Original Message - 
From: "Lev Serebryakov" 

To: "Mike Tancsa" 
Cc: 
Sent: Thursday, October 27, 2011 10:26 AM
Subject: Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIXenabled


Hello, Mike.
You wrote 7 октября 2011 г., 19:06:34:


This sure sounds like the issue I was seeing with the 7.1.9 driver...
However, it has been fixed for me by going to 7.2.3, which is in
RELENG_8.  Is it possible you have a couple of issues going on since you
are using lagg as well ?  Another problem some folks have reported is
that in the BIOS, if you have an option for ASPM, make sure its disabled.

 I had a lot of such problems with 7.1.9 on my 82566DM, and I
thought, that new driver is Ok, but yesterday it happens again with
7.2.3.

 No packets could be sent, buffers are overfilled, only full reset
helps (after "ifconfig wm0 down && ifconfig em0 up" ping starts to
report "Host is down" for any remote host, instead of "No buffer space
available")...

 8-STABLE, 7.2.3 driver, amd64, 82566DM LOM chip.

--
// Black Lion AKA Lev Serebryakov 

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"



This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re[2]: PCI-E VT6130 NIC (if_vge) hang system with gigabit link

2011-10-27 Thread Andrey Smagin



26 октября 2011, 20:49 от YongHyeon PYUN :
> On Wed, Oct 26, 2011 at 01:49:14PM +0400, Andrey Smagin wrote:
> > Hi !
> > vge0@pci0:2:0:0:class=0x02 card=0x01101106 chip=0x31191106 
> > rev=0x82 hdr=0x00
> > vendor = 'VIA Technologies, Inc.'
> > device = 'VT6120/VT6121/VT6122 Gigabit Ethernet Adapter'
> > class  = network
> > subclass   = ethernet
> > bar   [10] = type I/O Port, range 32, base 0x8000, size 256, enabled
> > bar   [14] = type Memory, range 64, base 0xd410, size 256, enabled
> > cap 01[50] = powerspec 3  supports D0 D1 D2 D3  current D0
> > cap 10[90] = PCI-Express 1 endpoint max data 128(128) link x1(x1)
> > cap 05[c0] = MSI supports 1 message, 64 bit, vector masks enabled with 
> > 1 message
> > ecap 0001[100] = AER 1 0 fatal 1 non-fatal 6 corrected
> > ecap 0003[130] = Serial 1 1106
> > vge1@pci0:1:0:0:class=0x02 card=0x01101106 chip=0x31191106 
> > rev=0x82 hdr=0x00
> > vendor = 'VIA Technologies, Inc.'
> > device = 'VT6120/VT6121/VT6122 Gigabit Ethernet Adapter'
> > class  = network
> > subclass   = ethernet
> > bar   [10] = type I/O Port, range 32, base 0x7000, size 256, enabled
> > bar   [14] = type Memory, range 64, base 0xd400, size 256, enabled
> > cap 01[50] = powerspec 3  supports D0 D1 D2 D3  current D0
> > cap 10[90] = PCI-Express 1 endpoint max data 128(128) link x1(x1)
> > cap 05[c0] = MSI supports 1 message, 64 bit, vector masks enabled with 
> > 1 message
> > ecap 0001[100] = AER 1 1 fatal 1 non-fatal 6 corrected
> > ecap 0003[130] = Serial 1 1106
> >
> > dmesg is empty


 
> Hmm, check whether you have old dmesg file in /var/log directory.
> At least, I need output of 'devinfo -rv' to know which PHY you have.

pcib4 pnpinfo vendor=0x10de device=0x005d subvendor=0x 
subdevice=0x class=0x060400 at slot=13 function=0 handle=\_SB_.PCI0.XVR1
I/O ports:
0x8000-0x8fff
I/O memory addresses:
0xd410-0xd41f
  pci2
vge0 pnpinfo vendor=0x1106 device=0x3119 subvendor=0x1106 
subdevice=0x0110 class=0x02 at slot=0 function=0
Interrupt request lines:
256
pcib4 I/O port window:
0x8000-0x80ff
pcib4 memory window:
0xd410-0xd41000ff
  miibus1
ip1000phy0 pnpinfo oui=0x9c3 model=0x19 rev=0x0 at phyno=1
pcib5 pnpinfo vendor=0x10de device=0x005d subvendor=0x 
subdevice=0x class=0x060400 at slot=14 function=0 handle=\_SB_.PCI0.XVR0
I/O ports:
0x7000-0x7fff
I/O memory addresses:
0xd400-0xd40f
  pci1
vge1 pnpinfo vendor=0x1106 device=0x3119 subvendor=0x1106 
subdevice=0x0110 class=0x02 at slot=0 function=0
Interrupt request lines:
257
pcib5 I/O port window:
0x7000-0x70ff
pcib5 memory window:
0xd400-0xd4ff
  miibus2
ip1000phy1 pnpinfo oui=0x9c3 model=0x19 rev=0x0 at phyno=1

> By chance, are you using manual link configuration instead of
> relying on auto-negotiation?

I tried  manual link and auto-negotiation. Now I not remember which mode 
hang system. I will try and write result.



> 
> >
> > 26 октября 2011, 00:11 от YongHyeon PYUN :
> > > On Tue, Oct 25, 2011 at 10:44:48AM +0400, Andrey Smagin wrote:
> > > > Hi ALL !  If I connect gigabit switch to my card - system hang until I 
> > > > unplug patchcord from device.
> > > > With 100Mbit switch card work good.
> > >
> > > Show me the output of dmesg and 'pciconf -lcbv'.
> > >
> > > > System:  Current - r226163
> > > >
> > >
> >
> ___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIXenabled

2011-10-27 Thread Lev Serebryakov
Hello, Steven.
You wrote 27 октября 2011 г., 13:49:29:

> What did netstat -m show?
  Nothing criminal :(

13414/2921/16335 mbufs in use (current/cache/total)
4997/533/5530/204800 mbuf clusters in use (current/cache/total/max)
4626/329 mbuf+clusters out of packet secondary zone in use (current/cache)
78/2976/3054/192000 4k (page size) jumbo clusters in use 
(current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
13659K/13700K/27359K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines


-- 
// Black Lion AKA Lev Serebryakov 

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled

2011-10-27 Thread Emil Muratov




Hi Hooman

Here is what I've got when the script triggered just in time when the 
interface was locked



11.10.26-23:39:10 ... interface em0 is down...

FreeBSD ion.hotplug.ru 8.2-STABLE FreeBSD 8.2-STABLE #0: Thu Oct 20 
20:20:25 MSD 2011 r...@epia.home

.lan:/usr/obj/usr/src/sys/ION6debug  amd64
11:39PM  up  1:12, 2 users, load averages: 0.26, 0.48, 0.58


 == vmstat -i ==
interrupt  total   rate
irq22: nfe0 16644480   3865
cpu0: timer  8610122   1999
irq256: ahci0 606705140
irq257: em0:rx 0 3896622904
irq258: em0:tx 0 2762957641
irq259: em0:link 620  0
cpu3: timer  8609499   1999
cpu1: timer  8609499   1999
cpu2: timer  8609499   1999
Total   58350003  13550

 == netstat -ind ==
NameMtu Network   Address  Ipkts Ierrs Idrop
Opkts Oerrs  Coll Drop
usbus 00 0 
00 0 00
usbus 00 0 
00 0 00
nfe0   1500   00:25:22:21:86:89  7157140 0 0 
12266747 0 00
nfe0   1500 fe80::225:22f fe80::225:22ff:fe0 - 
-   85 - --
nfe0   1500 10.16.128.0/1 10.16.189.71 0 - -
48135 - --
em09000   00:1b:21:ab:bf:4a  5465087   623 0  
2862028 0 0  113
em09000 192.168.168.0 192.168.168.1   764085 - -  
1005078 - --
em09000 fe80::21b:21f fe80::21b:21ff:fe   45 - -  
252 - --
em09000 2002:d58d:871 2002:d58d:8715:1:   73 - 
-   38 - --
wifi   1500   00:1b:21:ab:bf:4a  347 0 0  
350 0 00
wifi   1500 192.168.168.6 192.168.168.65   0 - 
-0 - --
wifi   1500 fe80::225:x fe80::225:x:x0 - -  
349 - --
wifi   1500 2002:x:x 2002:x:x:2:0 - -0 
- --
wifio  1500   00:1b:21:ab:bf:4a59559 0 0   
114639 0 00
wifio  1500 192.168.168.8 192.168.168.81   0 - -  
160 - --
wifio  1500 fe80::225:x fe80::225:x:x0 - -
0 - --
stf0   1280 5725 0 0 
6125   420 00
stf0   1280 2002:x:x 2002:x:x::1 1878 - - 1121 
- --
ng0*   1500   0 0 
00 0 00
ng1*   1500   0 0 
00 0 00
ng21492 7143733 0 0 
12234436 0 00
ng21492 213.141.x.x 213.141.x.x 4735932 - -  
8480089 - --
ng21492 fe80::x:x fe80::x:x:x0 - -1 
- --
tun0   1455 350 0 0  
172 0 00
tun0   1455 fe80::225:x fe80::225:x:x0 - -
2 - --
tun0   1455 192.168.169.1 192.168.169.1  117 - -  
167 - --


Oct 26 23:39:11 ion kernel: em0: hw tdh = 975, hw tdt = 944
Oct 26 23:39:11 ion kernel: em0: hw rdh = 960, hw rdt = 959
Oct 26 23:39:11 ion kernel: em0: Tx Queue Status = 1
Oct 26 23:39:11 ion kernel: em0: TX descriptors avail = 31
Oct 26 23:39:11 ion kernel: em0: Tx Descriptors avail failure = 0
Oct 26 23:39:11 ion kernel: em0: RX discarded packets = 0
Oct 26 23:39:11 ion kernel: em0: RX Next to Check = 960
Oct 26 23:39:11 ion kernel: em0: RX Next to Refresh = 959

net.inet.ip.intr_queue_maxlen: 4096
net.inet.ip.intr_queue_drops: 0
dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.2.3
dev.em.0.%driver: em
dev.em.0.%location: slot=0 function=0
dev.em.0.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x8086 
subdevice=0xa01f class=0x02

dev.em.0.%parent: pci2
dev.em.0.nvm: -1
dev.em.0.debug: -1
dev.em.0.rx_int_delay: 200
dev.em.0.tx_int_delay: 200
dev.em.0.rx_abs_int_delay: 4096
dev.em.0.tx_abs_int_delay: 4096
dev.em.0.rx_processing_limit: 100
dev.em.0.flow_control: 3
dev.em.0.eee_control: 0
dev.em.0.link_irq: 648
dev.em.0.mbuf_alloc_fail: 0
dev.em.0.cluster_alloc_fail: 0
dev.em.0.dropped: 0
dev.em.0.tx_dma_fail: 0
dev.em.0.rx_overruns: 0
dev.em.0.watchdog_timeouts: 0
dev.em.0.device_control: 1477444168
dev.em.0.rx_control: 100827170
dev.em.0.fc_high_water: 11264
dev.em.0.fc_low_water: 9764
dev.em.0.queue0.txd_head: 975
dev.em.0.queue0.txd_tail: 944
dev.em.0.queue0.tx_irq: 2762762
dev.em.0.queue0.no_desc_avail: 0
dev.em.0.queue0.rxd_head: 960
dev.em.0.queue0.rxd_tail: 959
dev.em.0.queue0.rx_irq: 3895860
dev.em.0.mac_stats.excess_coll: 0
dev.em.0.mac_stats.single_coll: 0
dev.em.0.mac_stats.multiple_coll: 0
dev.em.0.mac_stats.late_coll: 0
dev.em.

Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled

2011-10-27 Thread Arnaud Lacombe
Hi,

On Thu, Oct 27, 2011 at 2:29 AM, Emil Muratov  wrote:
>
>
>> Hi,
>>
>> Can yan you pls post the output of these command _when_ the problem
>> happens?
>>
>> uname -a
>> sysctl dev.em
>> netstat -ind
>> ifconfig
>>
>
> Hi Hooman
>
> Here is what I've got when the script triggered just in time when the
> interface was locked
>
>
> 11.10.26-23:39:10 ... interface em0 is down...
>
> FreeBSD ion.hotplug.ru 8.2-STABLE FreeBSD 8.2-STABLE #0: Thu Oct 20 20:20:25
>
Please upgrade to 8-STABLE, similar issues have been fixed there.

Thanks,
 - Arnaud

> MSD 2011     r...@epia.home
> .lan:/usr/obj/usr/src/sys/ION6debug  amd64
> 11:39PM  up  1:12, 2 users, load averages: 0.26, 0.48, 0.58
>
>
>  == vmstat -i ==
> interrupt                          total       rate
> irq22: nfe0                     16644480       3865
> cpu0: timer                      8610122       1999
> irq256: ahci0                     606705        140
> irq257: em0:rx 0                 3896622        904
> irq258: em0:tx 0                 2762957        641
> irq259: em0:link                     620          0
> cpu3: timer                      8609499       1999
> cpu1: timer                      8609499       1999
> cpu2: timer                      8609499       1999
> Total                           58350003      13550
>
>  == netstat -ind ==
> Name    Mtu Network       Address              Ipkts Ierrs Idrop    Opkts
> Oerrs  Coll Drop
> usbus     0                                0     0     0        0
>   0     0    0
> usbus     0                                0     0     0        0
>   0     0    0
> nfe0   1500       00:25:22:21:86:89  7157140     0     0 12266747
>   0     0    0
> nfe0   1500 fe80::225:22f fe80::225:22ff:fe        0     -     -       85
>   -     -    -
> nfe0   1500 10.16.128.0/1 10.16.189.71             0     -     -    48135
>   -     -    -
> em0    9000       00:1b:21:ab:bf:4a  5465087   623     0  2862028
>   0     0  113
> em0    9000 192.168.168.0 192.168.168.1       764085     -     -  1005078
>   -     -    -
> em0    9000 fe80::21b:21f fe80::21b:21ff:fe       45     -     -      252
>   -     -    -
> em0    9000 2002:d58d:871 2002:d58d:8715:1:       73     -     -       38
>   -     -    -
> wifi   1500       00:1b:21:ab:bf:4a      347     0     0      350
>   0     0    0
> wifi   1500 192.168.168.6 192.168.168.65           0     -     -        0
>   -     -    -
> wifi   1500 fe80::225:x fe80::225:x:x        0     -     -      349     -
>   -    -
> wifi   1500 2002:x:x 2002:x:x:2:        0     -     -        0     -     -
>  -
> wifio  1500       00:1b:21:ab:bf:4a    59559     0     0   114639
>   0     0    0
> wifio  1500 192.168.168.8 192.168.168.81           0     -     -      160
>   -     -    -
> wifio  1500 fe80::225:x fe80::225:x:x        0     -     -        0     -
>   -    -
> stf0   1280                             5725     0     0     6125
> 420     0    0
> stf0   1280 2002:x:x 2002:x:x::1     1878     -     -     1121     -     -
>  -
> ng0*   1500                               0     0     0        0
>   0     0    0
> ng1*   1500                               0     0     0        0
>   0     0    0
> ng2    1492                         7143733     0     0 12234436
>   0     0    0
> ng2    1492 213.141.x.x 213.141.x.x     4735932     -     -  8480089     -
>   -    -
> ng2    1492 fe80::x:x fe80::x:x:x        0     -     -        1     -     -
>    -
> tun0   1455                             350     0     0      172
>   0     0    0
> tun0   1455 fe80::225:x fe80::225:x:x        0     -     -        2     -
>   -    -
> tun0   1455 192.168.169.1 192.168.169.1          117     -     -      167
>   -     -    -
>
> Oct 26 23:39:11 ion kernel: em0: hw tdh = 975, hw tdt = 944
> Oct 26 23:39:11 ion kernel: em0: hw rdh = 960, hw rdt = 959
> Oct 26 23:39:11 ion kernel: em0: Tx Queue Status = 1
> Oct 26 23:39:11 ion kernel: em0: TX descriptors avail = 31
> Oct 26 23:39:11 ion kernel: em0: Tx Descriptors avail failure = 0
> Oct 26 23:39:11 ion kernel: em0: RX discarded packets = 0
> Oct 26 23:39:11 ion kernel: em0: RX Next to Check = 960
> Oct 26 23:39:11 ion kernel: em0: RX Next to Refresh = 959
>
> net.inet.ip.intr_queue_maxlen: 4096
> net.inet.ip.intr_queue_drops: 0
> dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.2.3
> dev.em.0.%driver: em
> dev.em.0.%location: slot=0 function=0
> dev.em.0.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x8086
> subdevice=0xa01f class=0x02
> dev.em.0.%parent: pci2
> dev.em.0.nvm: -1
> dev.em.0.debug: -1
> dev.em.0.rx_int_delay: 200
> dev.em.0.tx_int_delay: 200
> dev.em.0.rx_abs_int_delay: 4096
> dev.em.0.tx_abs_int_delay: 4096
> dev.em.0.rx_processing_limit: 100
> dev.em.0.flow_control: 3
> dev.em.0.eee_control: 0
> dev.em.0.link_irq: 648
> dev.em.0.mbuf_alloc_fail: 0
> dev.em.0.cluster_alloc_fail: 0
> dev.em.0.dropped: 0
> dev.em.0.tx_dma_fail: 0
> dev.em.0.rx_overruns: 0
> dev.em.0.watchdog_timeouts: 0
> dev.em.0.device_control: 1477444168
> d

Re: SCTP : problems in sending ASCONF chunks

2011-10-27 Thread Michael Tüxen
OK, please try two fixes:

http://svn.freebsd.org/changeset/base/226868
This fixes a problem which resulted in the ASCONF chunks not being sent.

http://svn.freebsd.org/changeset/base/226869
This fixes a problem which resulted in the path confirmation chunks
being sent to the wrong destination and therefore not confirming the
path.

Please let me know if you still have any issues.

Best regards
Michael
On Oct 24, 2011, at 4:20 AM, jyl_2006 wrote:

> Hi,Tüxen
> 
> I will provide more detail:
> The Topology is:
> 1(192.168.1.20)
> computer  A   ---1 computer B(192.168.1.80)
> 2(192.168.1.50)
> means computer has two wireless cards , I name them with A_1 and A_2,
> computer B has one wireless cards, and its name is B_1.
> 
> Now, A_1 init a association with B_1. Here, I provide message getting from
> wireshark ,
> INIT:
> Internet Protocol, Src: 192.168.1.20 (192.168.1.20), Dst: 192.168.1.80
> (192.168.1.80)
> Supported Extensions parameter (Supported types: ASCONF, ASCONF_ACK,
> FORWARD_TSN, PKTDROP, STREAM_RESET, AUTH)
> INIT ACK:
> Internet Protocol, Src: 192.168.1.80 (192.168.1.80), Dst: 192.168.1.20
> (192.168.1.20)
> Supported Extensions parameter (Supported types: ASCONF, ASCONF_ACK,
> FORWARD_TSN, PKTDROP, STREAM_RESET, AUTH)
> 
> Debug message:
> SCTP_SACK process cum_ack:45452434 num_seg:0 a_rwnd:1864135
> Check for chunk output prw:1864135 tqe:1 tf=0
> Send called addr:0xc591c980 send length 2
> Calling ipv4 output routine from low level src addr:c0a80114
> Destination is c0a80150
> RTP route is 0xc5caaaf8 through
> IP output returns 0
> m-c-o put out 1
> Ok, we have put out 1 chunks
> USR Send complete qo:0 prw:1863859 unsent:18 tf:18 cooq:1 toqs:18 err:0
> Ok laddr->ifa:0xc5baab00 is possible, asconf_queue_mgmt: inserted asconf
> ADD_IP_ADDRESS: IPv4 address: 192.168.1.50:0
> m-c-o put out 0
> Ok, we have put out 0 chunks
> sctp_input() length:28 iphlen:20
> sctp_input(): Packet of length 48 received on wlan0 with csum_flags 0x0.
> Ok, Common input processing called, m:0xc72dab00 iphlen:20 offset:32
> length:48 stcb:0xcc2335dc
> stcb:0xcc2335dc state:8
> sctp_process_control: iphlen=20, offset=32, length=48 stcb:0xcc2335dc
> sctp_process_control: processing a chunk type=3, len=16
> 
> Actually,I also test following Topology :
> 1(192.168.1.20)
> --1(192.168.1.80)
> computer  A   --- computer B
> 2(192.168.2.20)
> --2(192.168.2.80)
> means computer has two wireless cards , I name them with A_1 and A_2,
> computer B has two wireless cards, and its name is B_1 and B_2.
> 
> The result from wireshark and debug message have same results.
> 
> 
> --
> View this message in context: 
> http://freebsd.1045724.n5.nabble.com/SCTP-problems-in-sending-ASCONF-chunks-tp4929128p4931035.html
> Sent from the freebsd-net mailing list archive at Nabble.com.
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
> 

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: 9.0-RC1 panic in tcp_input: negative winow.

2011-10-27 Thread Lawrence Stewart

On 10/26/11 22:53, John Baldwin wrote:

On Wednesday, October 26, 2011 3:54:31 am Pawel Jakub Dawidek wrote:

On Mon, Oct 24, 2011 at 08:14:22AM -0400, John Baldwin wrote:

On Sunday, October 23, 2011 11:58:28 am Pawel Jakub Dawidek wrote:

On Sun, Oct 23, 2011 at 11:44:45AM +0300, Kostik Belousov wrote:

On Sun, Oct 23, 2011 at 08:10:38AM +0200, Pawel Jakub Dawidek wrote:

My suggestion would be that if we won't be able to fix it before 9.0,
we should turn this assertion off, as the system seems to be able to
recover.


Shipped kernels have all assertions turned off.


Yes, I'm aware of that, but many people compile their production kernels
with INVARIANTS/INVARIANT_SUPPORT to fail early instead of eg.
corrupting data. I'd be fine in moving this under DIAGNOSTIC or changing
it into a printf, so it will be visible.


No, the kernel is corrupting things in other places when this is true, so
if you are running with INVARIANTS, we want to know about it.   Specifically,
in several places in TCP we assume that rcv_adv>= rcv_nxt, and depend on
being able to do 'rcv_adv - rcv_nxt'.

In this case, it looks like the difference is consistently less than one
frame.  I suspect the other end of the connection is sending just beyond the
end of the advertised window (it probably assumes it is better to send a full
frame if it has that much pending data even though part of it is beyond the
window edge vs sending a truncated packet that just fills the window) and that
that frame is accepted ok in the header prediction case and it's ACK is
delayed, but the next packet to arrive then trips over this assumption.

Since 'win' is guaranteed to be non-negative and we explicitly cast
'rcv_adv - rcv_nxt' to (int) in the following line that the assert is checking
for:

tp->rcv_wnd = imax(win, (int)(tp->rcv_adv - tp->rcv_nxt));

I think we already handle this case ok and perhaps the assertion can just be
removed?  Not sure if others feel that it warrants a comment to note that this
is the case being handled.


I added debug to the places where rcv_adv and rcv_nxt are modified. Here
is what happens before the panic occurs:

tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022361548 
rcv_adv 4022360100 diff -1448
tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022362298 
rcv_adv 4022361548 diff -750
tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022363746 
rcv_adv 4022362298 diff -1448
tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022364836 
rcv_adv 4022363746 diff -1090
tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022366284 
rcv_adv 4022364836 diff -1448
tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022370628 
rcv_adv 4022369690 diff -938
tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022379140 
rcv_adv 4022377692 diff -1448
tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022387792 
rcv_adv 4022386344 diff -1448
tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022388890 
rcv_adv 4022387792 diff -1098
tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022390338 
rcv_adv 4022388890 diff -1448
tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022394563 
rcv_adv 4022394342 diff -221
panic: tcp_input negative window: tp 0xfe000dab1b70 rcv_nxt 4022394563 
rcv_adv 4022394342 win=0 diff -221

I can send you the full log if you want, I've plenty of messages where
rcv_adv<  rcv_nxt, not all of them trigger this assertion.


The assertion would be triggered when the next packet arrives (as I said
above).  Try modifying your debugging output to also log if the ACK is
delayed.  I suspect it is not delayed until the last one.  (Pushing out an
ACK will reset rcv_adv to be beyond rcv_nxt in tcp_output(), so in the case
of an immediate ACK, rcv_nxt>  rcv_adv is only a transient condition all
under a single lock invocation so never visible to other consumers of the
protocol control block.)  If that is what you see, then that confirms what
I guessed above and I will likely just remove the assertion in tcp_input()
and patch the timewait code to handle this case.



Pawel, have you been able to confirm John's hypothesis? What I don't 
quite get is why we haven't had a lot more reports of this issue...


Cheers,
Lawrence
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: 9.0-RC1 panic in tcp_input: negative winow.

2011-10-27 Thread Pawel Jakub Dawidek
On Fri, Oct 28, 2011 at 11:29:34AM +1100, Lawrence Stewart wrote:
> On 10/26/11 22:53, John Baldwin wrote:
> > The assertion would be triggered when the next packet arrives (as I said
> > above).  Try modifying your debugging output to also log if the ACK is
> > delayed.  I suspect it is not delayed until the last one.  (Pushing out an
> > ACK will reset rcv_adv to be beyond rcv_nxt in tcp_output(), so in the case
> > of an immediate ACK, rcv_nxt>  rcv_adv is only a transient condition all
> > under a single lock invocation so never visible to other consumers of the
> > protocol control block.)  If that is what you see, then that confirms what
> > I guessed above and I will likely just remove the assertion in tcp_input()
> > and patch the timewait code to handle this case.
> >
> 
> Pawel, have you been able to confirm John's hypothesis? [...]

Yeah, sorry. I moved the debug to the points where we drop the t_inpcb
lock and I still see rcv_nxt being greater than rcv_adv:

tcp_do_segment:2970 negative window: tp 0xfe00685ee3d0 rcv_nxt 
1312878324 rcv_adv 1312878187

This is just before the INP_WUNLOCK(tp->t_inpcb) under 'check_delack'
label. I see this a lot (it was logged 545 times for 11 different tp
pointers during 24h period).

tcp_do_segment:3009 negative window: tp 0xfe005cfc6000 rcv_nxt 
1442546453 rcv_adv 1442545722

This is just before calling tcp_output(). This one was logged 65 times
for 3 different tp pointers.
I placed a debug also after tcp_output() call, but it is not logged, so
once we return from tcp_output() everything is fine.

The panic would be triggered 115 times for 5 different tp pointers
during that time.

I write 'tp pointers' as I'm not 100% sure if the same pointer always
represents the same connection or if it is reused.

> [...] What I don't 
> quite get is why we haven't had a lot more reports of this issue...

Maybe because my TCP/IP stack is heavly modified? ...not:)

No idea to be honest. Ask Ken to turn on INVARIANTS in 9.0-RC2 and we
will see:)

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgp11UIhQjZvo.pgp
Description: PGP signature