Re: igb dual-port adapter 1200Mbps limit - what to tune?

2010-11-11 Thread Eugene Perevyazko
On Thu, Nov 11, 2010 at 01:47:02AM +0100, Ivan Voras wrote:
> On 11/10/10 12:04, Eugene Perevyazko wrote:
> 
> >CPU is e5...@2.4ghz, 8 cores, irqs bound to different cores skipping HT 
> >ones.
> 
> Unless you need the CPU cores for other tasks on the server, they won't 
> help you with network throughput here. Faster but fewer cores might.
> 
> >Tried 2 queues and 1 queue per iface, neither hitting cpu limit.
> 
> Are you sure you are not hitting the CPU limit on individual cores? Have 
> you tried running "top -H -S"?
> 
Sure, even with 1queue per iface load is 40-60% on busy core, with 2 queues it 
was much lower.
Now I've got the module for mb with 2 more ports, going to see if it helps.

-- 
Eugene Perevyazko
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


FreeBSD TCP Behavior with Linux NAT

2010-11-11 Thread Christopher Penney
Hi,

I have a curious problem I'm hoping someone can help with or at least
educate me on.

I have several large Linux clusters and for each one we hide the compute
nodes behind a head node using NAT.  Historically, this has worked very well
for us and any time a NAT gateway (the head node) reboots everything
recovers within a minute or two of it coming back up.  This includes NFS
mounts from Linux and Solaris NFS servers, license server connections, etc.

Recently, we added a FreeBSD based NFS server to our cluster resources and
have had significant issues with NFS mounts hanging if the head node
reboots.  We don't have this happen much, but it does occasionally happen.
 I've explored this and it seems the behavior of FreeBSD differs a bit from
at least Linux and Solaris with respect to TCP recovery.  I'm curious if
someone can explain this or offer any workarounds.

Here are some specifics from a test I ran:

Before the reboot two Linux clients were mounting the FreeBSD server.  They
were both using port 903 locally.  On the head node clientA:903 was remapped
to headnode:903 and clientB:903 was remapped to headnode:601.  There is no
activity when the reboot occurs.  The head node takes a few minutes to come
back up (we kept it down for several minutes).

When it comes back up clientA and clientB try to reconnect to the FreeBSD
NFS server.  They both use the same source port, but since the head node's
conntrack table is cleared it's a race to see who gets what port and this
time clientA:903 appears as headnode:601 and clientB:903 appears as
headnode:903 ( >>> they essentially switch places as far as the FreeBSD
server would see <<< ).

The FreeBSD NFS server, since there was no outstanding acks it was waiting
on, thinks things are ok so when it gets a SYN from the two clients it only
responds with an ACK.  The ACK for each that it replies with is bogus
(invalid seq number) because it's using the return path the other client was
using before the reboot so the client sends a RST back, but it never gets to
the FreeBSD system since the head node's NAT hasn't yet seen the full
handshake (that would allow return packets).  The end result is a
"permanent" hang (at least until it would otherwise cleanup idle TCP
connections).

This is in stark contrast to the behavior of the other systems we have.
 Other systems respond to the SYN used to reconnect with a SYN/ACK.  They
appear to implicitly tear down the return path based on getting a SYN from a
seemingly already established connection.

I'm assuming this is one of the grey areas where there is no specific
behavior outlined in an RFC?  Is there any way to make the FreeBSD system
more reliable in this situation (like making it implicitly tear down the
return)?  Or is there a way to adjust the NAT setup to allow the RST to
return to the FreeBSD system?  Currently, NAT is setup with simply:

iptables -t nat -A POSTROUTING -s 10.1.0.0/16 -o bond0 -j SNAT --to 1.2.3.4

Where 1.2.3.4 is the intranet address and 10.1.0.0 is the cluster network.

Thanks!

Chris
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


FreeBSD TCP Behavior with Linux NAT

2010-11-11 Thread Christopher Penney
Hi,

I have a curious problem I'm hoping someone can help with or at least
educate me on.

I have several large Linux clusters and for each one we hide the compute
nodes behind a head node using NAT.  Historically, this has worked very well
for us and any time a NAT gateway (the head node) reboots everything
recovers within a minute or two of it coming back up.  This includes NFS
mounts from Linux and Solaris NFS servers, license server connections, etc.

Recently, we added a FreeBSD based NFS server to our cluster resources and
have had significant issues with NFS mounts hanging if the head node
reboots.  We don't have this happen much, but it does occasionally happen.
 I've explored this and it seems the behavior of FreeBSD differs a bit from
at least Linux and Solaris with respect to TCP recovery.  I'm curious if
someone can explain this or offer any workarounds.

Here are some specifics from a test I ran:

Before the reboot two Linux clients were mounting the FreeBSD server.  They
were both using port 903 locally.  On the head node clientA:903 was remapped
to headnode:903 and clientB:903 was remapped to headnode:601.  There is no
activity when the reboot occurs.  The head node takes a few minutes to come
back up (we kept it down for several minutes).

When it comes back up clientA and clientB try to reconnect to the FreeBSD
NFS server.  They both use the same source port, but since the head node's
conntrack table is cleared it's a race to see who gets what port and this
time clientA:903 appears as headnode:601 and clientB:903 appears as
headnode:903 ( >>> they essentially switch places as far as the FreeBSD
server would see <<< ).

The FreeBSD NFS server, since there was no outstanding acks it was waiting
on, thinks things are ok so when it gets a SYN from the two clients it only
responds with an ACK.  The ACK for each that it replies with is bogus
(invalid seq number) because it's using the return path the other client was
using before the reboot so the client sends a RST back, but it never gets to
the FreeBSD system since the head node's NAT hasn't yet seen the full
handshake (that would allow return packets).  The end result is a
"permanent" hang (at least until it would otherwise cleanup idle TCP
connections).

This is in stark contrast to the behavior of the other systems we have.
 Other systems respond to the SYN used to reconnect with a SYN/ACK.  They
appear to implicitly tear down the return path based on getting a SYN from a
seemingly already established connection.

I'm assuming this is one of the grey areas where there is no specific
behavior outlined in an RFC?  Is there any way to make the FreeBSD system
more reliable in this situation (like making it implicitly tear down the
return)?  Or is there a way to adjust the NAT setup to allow the RST to
return to the FreeBSD system?  Currently, NAT is setup with simply:

iptables -t nat -A POSTROUTING -s 10.1.0.0/16 -o bond0 -j SNAT --to 1.2.3.4

Where 1.2.3.4 is the intranet address and 10.1.0.0 is the cluster network.

Thanks!

Chris (not a list subscriber -- please CC if you can)
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: bjoern just received the itojun award at the ietf

2010-11-11 Thread jhell
On 11/10/2010 04:15, Randy Bush wrote:
> bjoern zeeb just received the itojun award.  congratulations, bjoern.
> and than you for all the hard work on the ipv6 stack.
> 
> randy

For this not understanding what this is or what its about:

http://www.isoc.org/awards/itojun/

Where you will find Bjoern on the front page.


Congrats! Bjoern You deserve it.

-- 

 jhell,v
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: icmp packets on em larger than 1472 [SEC=UNCLASSIFIED]

2010-11-11 Thread Kevin Oberman
> Date: Wed, 10 Nov 2010 23:49:56 -0800 (PST)
> From: Kirill Yelizarov 
> 
> 
> 
> --- On Thu, 11/11/10, Kevin Oberman  wrote:
> 
> > From: Kevin Oberman 
> > Subject: Re: icmp packets on em larger than 1472 [SEC=UNCLASSIFIED]
> > To: "Wilkinson, Alex" 
> > Cc: freebsd-sta...@freebsd.org
> > Date: Thursday, November 11, 2010, 8:26 AM
> > > Date: Thu, 11 Nov 2010 13:01:26
> > +0800
> > > From: "Wilkinson, Alex" 
> > > Sender: owner-freebsd-sta...@freebsd.org
> > > 
> > > 
> > >     0n Wed, Nov 10, 2010 at
> > 04:21:12AM -0800, Kirill Yelizarov wrote: 
> > > 
> > >     >All my em cards running
> > 8.1 stable don't reply to icmp echo requests packets larger
> > than 1472 bytes.
> > >     >
> > >     >On stable 7.2 the same
> > hardware works as expected:
> > >     ># ping -s 1500
> > 192.168.64.99
> > >     >PING 192.168.64.99
> > (192.168.64.99): 1500 data bytes
> > >     >1508 bytes from
> > 192.168.64.99: icmp_seq=0 ttl=63 time=1.249 ms
> > >     >1508 bytes from
> > 192.168.64.99: icmp_seq=1 ttl=63 time=1.158 ms
> > >     >
> > >     >Here is the dump on em
> > interface
> > >     >15:06:31.452043 IP
> > 192.168.66.65 > *: ICMP echo request, id 28729, seq
> > 5, length 1480
> > >     >15:06:31.452047 IP
> > 192.168.66.65 > : icmp
> > >     >15:06:31.452069 IP 
> > > 192.168.66.65: ICMP echo reply, id 28729, seq 5, length
> > 1480
> > >     >15:06:31.452071 IP ***
> > > 192.168.66.65: icmp
> > >     > 
> > >     >Same ping from same source
> > (it's a 8.1 stable with fxp interface) to em card running
> > 8.1 stable
> > >     >#pciconf -lv
> > > 
> >    >e...@pci0:3:4:0:   
> > class=0x02 card=0x10798086 chip=0x10798086 rev=0x03
> > hdr=0x00
> > >     >    vendor 
> >    = 'Intel Corporation'
> > >     >    device 
> >    = 'Dual Port Gigabit Ethernet Controller
> > (82546EB)'
> > >     >    class 
> >     = network
> > >     >   
> > subclass   = ethernet
> > >     >
> > >     ># ping -s 1472
> > 192.168.64.200
> > >     >PING 192.168.64.200
> > (192.168.64.200): 1472 data bytes
> > >     >1480 bytes from
> > 192.168.64.200: icmp_seq=0 ttl=63 time=0.848 ms
> > >     >^C
> > >     >
> > >     ># ping -s 1473
> > 192.168.64.200
> > >     >PING 192.168.64.200
> > (192.168.64.200): 1473 data bytes
> > >     >^C
> > >     >--- 192.168.64.200 ping
> > statistics ---
> > >     >4 packets transmitted, 0
> > packets received, 100.0% packet loss
> > > 
> > > works fine for me:
> > > 
> > > FreeBSD 8.1-STABLE #0 r213395
> > > 
> > > e...@pci0:0:25:0:class=0x02 card=0x3035103c
> > chip=0x10de8086 rev=0x02 hdr=0x00
> > >     vendor 
> >    = 'Intel Corporation'
> > >     device 
> >    = 'Intel Gigabit network connection
> > (82567LM-3 )'
> > >     class      =
> > network
> > >     subclass   =
> > ethernet
> > > 
> > > #ping -s 1473 host
> > > PING host(192.168.1.1): 1473 data bytes
> > > 1481 bytes from 192.168.1.1: icmp_seq=0 ttl=253
> > time=31.506 ms
> > > 1481 bytes from 192.168.1.1: icmp_seq=1 ttl=253
> > time=31.493 ms
> > > 1481 bytes from 192.168.1.1: icmp_seq=2 ttl=253
> > time=31.550 ms
> > > ^C
> > 
> > The reason the '-s 1500' worked was that the packets were
> > fragmented. If
> > I add the '-D' option, '-s 1473' fails on v7 and v8. Are
> > the V8 systems
> > where you see if failing without the '-D' on the same
> > network segment?
> > If not, it is likely that an intervening device is refusing
> > to fragment
> > the packet. (Some routers deliberately don't fragment ICMP
> > Echos Request
> > packets.) 
> 
> If i set -D -s 1473 sender side refuses to ping and that is
> correct. All mentioned above machines are behind the same router and
> switch. Same hardware running v7 is working while v8 is not. And i
> never saw such problems before.  Also correct me if i'm wrong but the
> dump shows that the packet arrived. I'll try driver from head and will
> post here results.

I did a bit more looking at this today and I see that something bogus is
going on and it MAY be the em driver.

I tried 1473 data byte pings without the DF flag. I then captured the
packets on both ends (where the sending system has a bge (Broadcom GE)
and the responding end has an em (Intel) card.

What I saw was the fragmented IP packets all being received by the
system with the em interface and an ICMP Echo Reply being sent back,
again fragmented. I saw the reply on both ends, so both interfaces were
able to fragment an over-sized packet, transmit the two pieces, and
receive the two pieces. The em device could re-assemble them properly,
but the bge device does not seem to re-assemble them correctly or else
has a problem with ICMP packets bigger then MTU size.

When I send from the em system, I see the packets and fragments all
arrive in good form, but the system never sends out a reply. Since this
is a kernel function, it may be a driver, but I suspect that it is in
the IP stack since I am seeing the problem with a Broadcom card and I
see the data all arriving.

I think Jack can probably relax, but some patch to th

Re: igb dual-port adapter 1200Mbps limit - what to tune?

2010-11-11 Thread Eugene Perevyazko
On Thu, Nov 11, 2010 at 12:49:52PM +0200, Eugene Perevyazko wrote:
> On Thu, Nov 11, 2010 at 01:47:02AM +0100, Ivan Voras wrote:
> > On 11/10/10 12:04, Eugene Perevyazko wrote:
> > 
> > >Tried 2 queues and 1 queue per iface, neither hitting cpu limit.
> > 
> > Are you sure you are not hitting the CPU limit on individual cores? Have 
> > you tried running "top -H -S"?
> > 
> Sure, even with 1queue per iface load is 40-60% on busy core, with 2 queues 
> it was much lower.
> Now I've got the module for mb with 2 more ports, going to see if it helps.
The IO module has em interfaces on it and somehow I've already got 2 panics
after moving one of vlans to it.

In the mean time, can someone explain me what is processed by threads marked 
like "irq256: igb0" and "igb0 que". May be understanding this will let me
pin those threads to cores more optimally.
There are (hw.igb.num_queues+1) "irq" threads and (hw.igb.num_queues) "que" 
threads. Now I just pin them sequentially to even cores (odd ones are HT).

Now I use hw.igb.num_queues=2, and with traffic limited to 1200Mbits the 
busiest core is still 60% idle...



-- 
Eugene Perevyazko
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [patch] WOL support for nfe(4)

2010-11-11 Thread Pyun YongHyeon
On Thu, Nov 11, 2010 at 08:08:25AM +0100, Yamagi Burmeister wrote:
> On Wed, 10 Nov 2010, Pyun YongHyeon wrote:
> 
> >On Tue, Nov 09, 2010 at 01:34:21PM -0800, Pyun YongHyeon wrote:
> >>On Tue, Nov 09, 2010 at 10:01:36PM +0100, Yamagi Burmeister wrote:
> >>>On Tue, 9 Nov 2010, Pyun YongHyeon wrote:
> >>>
> >No, the link stays at 1000Mbps so the driver must manually switch back
> >to 10/100Mbps.
> >
> 
> Hmm, this is real problem for WOL. Establishing 1000Mbps link to
> accept WOL frames is really bad idea since it can draw more power
> than 375mA. Consuming more power than 375mA is violation of
> PCI specification and some system may completely shutdown the power
> to protect hardware against over-current damage which in turn means
> WOL wouldn't work anymore. Even if WOL work with 1000Mbps link for
> all nfe(4) controllers, it would dissipate much more power.
> 
> Because nfe(4) controllers are notorious for using various PHYs,
> it's hard to write a code to reliably establish 10/100Mbps link in
> driver. In addition, nfe(4) is known to be buggy in link state
> handling such that forced media selection didn't work well. I'll
> see what could be done in this week if I find spare time.
> >>>
> >>>Hmm... Maybe just add a hint to the manpage that WOL is possible broken?
> >>
> >>I think this may not be enough. Because it can damage your hardware
> >>under certain conditions if protection circuit was not there.
> >>
> >
> >Ok, I updated patch which will change link speed to 10/100Mps when
> >shutdown/suspend is initiated.  You can get the patch at the
> >following URL. Please give it a try and let me know whether it
> >really changes link speed to 10/100Mbps. If it does not work as
> >expected, show me the dmesg output of your system.
> >
> >http://people.freebsd.org/~yongari/nfe/nfe.wol.patch2
> 
> Okay, that does the trick. At shutdown the link speed is changed to
> 10/100Mbps, at boot - either via WOL magic packet or manuell startup -
> it's changed back to 1000Mbps.
> 

Thanks, patch committed(r215132), will MFC after a week.

> Thanks again,
> Yamagi
> 
> -- 
> Homepage: www.yamagi.org
> Jabber:   yam...@yamagi.org
> GnuPG/GPG:0xEFBCCBCB
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: igb dual-port adapter 1200Mbps limit - what to tune?

2010-11-11 Thread Aleksandr A Babaylov
On Thu, Nov 11, 2010 at 08:05:40PM +0200, Eugene Perevyazko wrote:
> On Thu, Nov 11, 2010 at 12:49:52PM +0200, Eugene Perevyazko wrote:
> > On Thu, Nov 11, 2010 at 01:47:02AM +0100, Ivan Voras wrote:
> > > On 11/10/10 12:04, Eugene Perevyazko wrote:
> > > 
> > > >Tried 2 queues and 1 queue per iface, neither hitting cpu limit.
> > > 
> > > Are you sure you are not hitting the CPU limit on individual cores? Have 
> > > you tried running "top -H -S"?
> > > 
> > Sure, even with 1queue per iface load is 40-60% on busy core, with 2 queues 
> > it was much lower.
> > Now I've got the module for mb with 2 more ports, going to see if it helps.
> The IO module has em interfaces on it and somehow I've already got 2 panics
> after moving one of vlans to it.
> 
> In the mean time, can someone explain me what is processed by threads marked 
> like "irq256: igb0" and "igb0 que". May be understanding this will let me
> pin those threads to cores more optimally.
> There are (hw.igb.num_queues+1) "irq" threads and (hw.igb.num_queues)
> "que" threads. Now I just pin them sequentially to even cores (odd ones are 
> HT).
As far as I understand, you are not right about HT cores.
Try switch HT off and do not use HT in routers
in usual cases.

> Now I use hw.igb.num_queues=2, and with traffic limited to 1200Mbits the 
> busiest core is still 60% idle...
> 
> 
> 
> -- 
> Eugene Perevyazko
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: igb dual-port adapter 1200Mbps limit - what to tune?

2010-11-11 Thread Jack Vogel
The driver already handles the pinning, you shouldnt need to mess with it.

MSIX interrupts start at 256, the igb driver uses one vector per queue,
which
is an TX/RX pair. The driver creates as many queues as cores up to a max
of 8.

Jack


On Thu, Nov 11, 2010 at 10:05 AM, Eugene Perevyazko  wrote:

> On Thu, Nov 11, 2010 at 12:49:52PM +0200, Eugene Perevyazko wrote:
> > On Thu, Nov 11, 2010 at 01:47:02AM +0100, Ivan Voras wrote:
> > > On 11/10/10 12:04, Eugene Perevyazko wrote:
> > >
> > > >Tried 2 queues and 1 queue per iface, neither hitting cpu limit.
> > >
> > > Are you sure you are not hitting the CPU limit on individual cores?
> Have
> > > you tried running "top -H -S"?
> > >
> > Sure, even with 1queue per iface load is 40-60% on busy core, with 2
> queues it was much lower.
> > Now I've got the module for mb with 2 more ports, going to see if it
> helps.
> The IO module has em interfaces on it and somehow I've already got 2 panics
> after moving one of vlans to it.
>
> In the mean time, can someone explain me what is processed by threads
> marked
> like "irq256: igb0" and "igb0 que". May be understanding this will let me
> pin those threads to cores more optimally.
> There are (hw.igb.num_queues+1) "irq" threads and (hw.igb.num_queues) "que"
> threads. Now I just pin them sequentially to even cores (odd ones are HT).
>
> Now I use hw.igb.num_queues=2, and with traffic limited to 1200Mbits the
> busiest core is still 60% idle...
>
>
>
> --
> Eugene Perevyazko
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
>
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


ML370 G4 with poor Network Performance and high CPU Load

2010-11-11 Thread r...@reckschwardt.de

 Hello,

i am new in this Maillist and i use an ML370G4 with FreeBSD 8.1 AMD64. I 
try with netio and TCP. The used Nics are onboard Broadcom 
(PCI-X133Mhz), an Broadcom PCI-X Nic and an intel PCI-X Nic. The CPU 
load is around 35% and the performance like this:


Packet size  1k bytes:  99303 KByte/s Tx,  44576 KByte/s Rx.
Packet size  2k bytes:  72043 KByte/s Tx,  75200 KByte/s Rx.
Packet size  4k bytes:  23280 KByte/s Tx,  66072 KByte/s Rx.
Packet size  8k bytes:  55234 KByte/s Tx,  64470 KByte/s Rx.
Packet size 16k bytes:  82485 KByte/s Tx,  74099 KByte/s Rx.
Packet size 32k bytes:  93133 KByte/s Tx,  74992 KByte/s Rx.

I try the following tuning:

kern.ipc.maxsockbuf=16777216
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.sendbuf_inc=16384
net.inet.tcp.recvbuf_inc=524288
net.inet.tcp.inflight.enable=0
net.inet.tcp.hostcache.expire=1

but this is not helpfull, the Load goes to 60% and the Performance is 
also poor. How can i prevent this Problem?


thanks for response ré


P.S. the same Computer with Linux runs perfect with Performance and 1-2% 
Load,




Re: NFS + FreeBSD TCP Behavior with Linux NAT

2010-11-11 Thread Julian Elischer

On 11/11/10 6:36 AM, Christopher Penney wrote:

Hi,

I have a curious problem I'm hoping someone can help with or at least
educate me on.

I have several large Linux clusters and for each one we hide the compute
nodes behind a head node using NAT.  Historically, this has worked very well
for us and any time a NAT gateway (the head node) reboots everything
recovers within a minute or two of it coming back up.  This includes NFS
mounts from Linux and Solaris NFS servers, license server connections, etc.

Recently, we added a FreeBSD based NFS server to our cluster resources and
have had significant issues with NFS mounts hanging if the head node
reboots.  We don't have this happen much, but it does occasionally happen.
  I've explored this and it seems the behavior of FreeBSD differs a bit from
at least Linux and Solaris with respect to TCP recovery.  I'm curious if
someone can explain this or offer any workarounds.

Here are some specifics from a test I ran:

Before the reboot two Linux clients were mounting the FreeBSD server.  They
were both using port 903 locally.  On the head node clientA:903 was remapped
to headnode:903 and clientB:903 was remapped to headnode:601.  There is no
activity when the reboot occurs.  The head node takes a few minutes to come
back up (we kept it down for several minutes).

When it comes back up clientA and clientB try to reconnect to the FreeBSD
NFS server.  They both use the same source port, but since the head node's
conntrack table is cleared it's a race to see who gets what port and this
time clientA:903 appears as headnode:601 and clientB:903 appears as
headnode:903 (>>>  they essentially switch places as far as the FreeBSD
server would see<<<  ).

The FreeBSD NFS server, since there was no outstanding acks it was waiting
on, thinks things are ok so when it gets a SYN from the two clients it only
responds with an ACK.  The ACK for each that it replies with is bogus
(invalid seq number) because it's using the return path the other client was
using before the reboot so the client sends a RST back, but it never gets to
the FreeBSD system since the head node's NAT hasn't yet seen the full
handshake (that would allow return packets).  The end result is a
"permanent" hang (at least until it would otherwise cleanup idle TCP
connections).

This is in stark contrast to the behavior of the other systems we have.
  Other systems respond to the SYN used to reconnect with a SYN/ACK.  They
appear to implicitly tear down the return path based on getting a SYN from a
seemingly already established connection.

I'm assuming this is one of the grey areas where there is no specific
behavior outlined in an RFC?  Is there any way to make the FreeBSD system
more reliable in this situation (like making it implicitly tear down the
return)?  Or is there a way to adjust the NAT setup to allow the RST to
return to the FreeBSD system?  Currently, NAT is setup with simply:

iptables -t nat -A POSTROUTING -s 10.1.0.0/16 -o bond0 -j SNAT --to 1.2.3.4

Where 1.2.3.4 is the intranet address and 10.1.0.0 is the cluster network.


I just added NFS to the subject because the NFS people are thise you 
need to

connect with.

Thanks!

 Chris
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Poor situation with snmp support in FreeBSD

2010-11-11 Thread Brandon Gooch
On Fri, Mar 26, 2010 at 4:01 AM, Hartmut Brandt  wrote:
>> VS>I, probably, was to verbose, and didn't make myself clear enough. For now,
>> VS>from network admin point of view, it's 3 problems:
>> VS>1) No ARP support
>
> The ARP table should be there. It may be that it got 'lost' with the ARP
> changes last year. So this should be fixable. The ARP table is the old
> one, though.

Old thread, but I just recently bumped into this problem...

Does anyone CC'd in this exchange know how to go about fixing this?
Perhaps a pointer to a document describing the changes in ARP that
broke this?

Seems that net-snmp manages to gather this info:

# snmpwalk -v 1 -c public 192.168.1.1

...
IP-MIB::ipNetToMediaPhysAddress.1.192.168.0.105 = Hex-STRING: 00 21 E1 FB 25 2D
IP-MIB::ipNetToMediaPhysAddress.1.192.168.0.0 = Hex-STRING: 00 13 20 2E 89 61
IP-MIB::ipNetToMediaPhysAddress.1.192.168.0.168 = Hex-STRING: 00 11 43 A3 1C 1F
IP-MIB::ipNetToMediaPhysAddress.1.192.168.0.194 = Hex-STRING: 00 60 97 92 59 64
...

Thanks for any pointers...

-Brandon
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: icmp packets on em larger than 1472 [SEC=UNCLASSIFIED]

2010-11-11 Thread Pyun YongHyeon
On Thu, Nov 11, 2010 at 08:10:57AM -0800, Kevin Oberman wrote:
> > Date: Wed, 10 Nov 2010 23:49:56 -0800 (PST)
> > From: Kirill Yelizarov 
> > 
> > 
> > 
> > --- On Thu, 11/11/10, Kevin Oberman  wrote:
> > 
> > > From: Kevin Oberman 
> > > Subject: Re: icmp packets on em larger than 1472 [SEC=UNCLASSIFIED]
> > > To: "Wilkinson, Alex" 
> > > Cc: freebsd-sta...@freebsd.org
> > > Date: Thursday, November 11, 2010, 8:26 AM
> > > > Date: Thu, 11 Nov 2010 13:01:26
> > > +0800
> > > > From: "Wilkinson, Alex" 
> > > > Sender: owner-freebsd-sta...@freebsd.org
> > > > 
> > > > 
> > > >? ???0n Wed, Nov 10, 2010 at
> > > 04:21:12AM -0800, Kirill Yelizarov wrote: 
> > > > 
> > > >? ???>All my em cards running
> > > 8.1 stable don't reply to icmp echo requests packets larger
> > > than 1472 bytes.
> > > >? ???>
> > > >? ???>On stable 7.2 the same
> > > hardware works as expected:
> > > >? ???># ping -s 1500
> > > 192.168.64.99
> > > >? ???>PING 192.168.64.99
> > > (192.168.64.99): 1500 data bytes
> > > >? ???>1508 bytes from
> > > 192.168.64.99: icmp_seq=0 ttl=63 time=1.249 ms
> > > >? ???>1508 bytes from
> > > 192.168.64.99: icmp_seq=1 ttl=63 time=1.158 ms
> > > >? ???>
> > > >? ???>Here is the dump on em
> > > interface
> > > >? ???>15:06:31.452043 IP
> > > 192.168.66.65 > *: ICMP echo request, id 28729, seq
> > > 5, length 1480
> > > >? ???>15:06:31.452047 IP
> > > 192.168.66.65 > : icmp
> > > >? ???>15:06:31.452069 IP 
> > > > 192.168.66.65: ICMP echo reply, id 28729, seq 5, length
> > > 1480
> > > >? ???>15:06:31.452071 IP ***
> > > > 192.168.66.65: icmp
> > > >? ???> 
> > > >? ???>Same ping from same source
> > > (it's a 8.1 stable with fxp interface) to em card running
> > > 8.1 stable
> > > >? ???>#pciconf -lv
> > > >?
> > > ???>e...@pci0:3:4:0:???
> > > class=0x02 card=0x10798086 chip=0x10798086 rev=0x03
> > > hdr=0x00
> > > >? ???>? ? vendor?
> > > ???= 'Intel Corporation'
> > > >? ???>? ? device?
> > > ???= 'Dual Port Gigabit Ethernet Controller
> > > (82546EB)'
> > > >? ???>? ? class?
> > > ? ? = network
> > > >? ???>? ?
> > > subclass???= ethernet
> > > >? ???>
> > > >? ???># ping -s 1472
> > > 192.168.64.200
> > > >? ???>PING 192.168.64.200
> > > (192.168.64.200): 1472 data bytes
> > > >? ???>1480 bytes from
> > > 192.168.64.200: icmp_seq=0 ttl=63 time=0.848 ms
> > > >? ???>^C
> > > >? ???>
> > > >? ???># ping -s 1473
> > > 192.168.64.200
> > > >? ???>PING 192.168.64.200
> > > (192.168.64.200): 1473 data bytes
> > > >? ???>^C
> > > >? ???>--- 192.168.64.200 ping
> > > statistics ---
> > > >? ???>4 packets transmitted, 0
> > > packets received, 100.0% packet loss
> > > > 
> > > > works fine for me:
> > > > 
> > > > FreeBSD 8.1-STABLE #0 r213395
> > > > 
> > > > e...@pci0:0:25:0:class=0x02 card=0x3035103c
> > > chip=0x10de8086 rev=0x02 hdr=0x00
> > > >? ???vendor?
> > > ???= 'Intel Corporation'
> > > >? ???device?
> > > ???= 'Intel Gigabit network connection
> > > (82567LM-3 )'
> > > >? ???class? ? ? =
> > > network
> > > >? ???subclass???=
> > > ethernet
> > > > 
> > > > #ping -s 1473 host
> > > > PING host(192.168.1.1): 1473 data bytes
> > > > 1481 bytes from 192.168.1.1: icmp_seq=0 ttl=253
> > > time=31.506 ms
> > > > 1481 bytes from 192.168.1.1: icmp_seq=1 ttl=253
> > > time=31.493 ms
> > > > 1481 bytes from 192.168.1.1: icmp_seq=2 ttl=253
> > > time=31.550 ms
> > > > ^C
> > > 
> > > The reason the '-s 1500' worked was that the packets were
> > > fragmented. If
> > > I add the '-D' option, '-s 1473' fails on v7 and v8. Are
> > > the V8 systems
> > > where you see if failing without the '-D' on the same
> > > network segment?
> > > If not, it is likely that an intervening device is refusing
> > > to fragment
> > > the packet. (Some routers deliberately don't fragment ICMP
> > > Echos Request
> > > packets.) 
> > 
> > If i set -D -s 1473 sender side refuses to ping and that is
> > correct. All mentioned above machines are behind the same router and
> > switch. Same hardware running v7 is working while v8 is not. And i
> > never saw such problems before.  Also correct me if i'm wrong but the
> > dump shows that the packet arrived. I'll try driver from head and will
> > post here results.
> 
> I did a bit more looking at this today and I see that something bogus is
> going on and it MAY be the em driver.
> 
> I tried 1473 data byte pings without the DF flag. I then captured the
> packets on both ends (where the sending system has a bge (Broadcom GE)
> and the responding end has an em (Intel) card.
> 
> What I saw was the fragmented IP packets all being received by the
> system with the em interface and an ICMP Echo Reply being sent back,
> again fragmented. I saw the reply on both ends, so both interfaces were
> able to fragment an over-sized packet, transmit the two pieces, and
> receive the two pieces. The em device could re-assemble them properly,
> but the bge device does not seem to re-assemble them correctly or else
> has a problem with ICMP packets bigger then MTU size.
> 
> When I

Problem with re0

2010-11-11 Thread Gabor Radnai
Hi,

I have an Asus M2NPV-VM motherboard with integrated Nvidia MCP51 Gigabit
Ethernet NIC and
TP-Link TG-3468 PCIe network card which is using Realtek 8111 chip.

I have problem with the re driver: the Nvidia network interface is working
properly but the other
though it seems recognized by OS I cannot use. Sporadically it remains down
and if it gets up then
does not get ip address via DHCP nor help if I set static ip address. Can
manipulate via ifconfig but
unreachable via IP.

I replaced cable, interchanged cable working with Nvidia, restarted
switch/router but no luck so far.
Also using this nic in a Windows machine - it works. Using my Asus mob with
Ubuntu Live CD - card works.

Can it be a driver bug or this type of chip is not supported by re driver?

Thanks,
Gabor

uname -v
FreeBSD 8.1-RELEASE #0 r210200M: Wed Jul 21 14:21:18 CEST 2010
r...@neo.vx.sk:/usr/obj/usr/
src/sys/GENERIC

pciconf:
n...@pci0:0:20:0:class=0x068000 card=0x816a1043 chip=0x026910de rev=0xa3
hdr=0x00
vendor = 'NVIDIA Corporation'
device = 'MCP51 Network Bus Enumerator'
class  = bridge
r...@pci0:1:0:0:class=0x02 card=0x816810ec chip=0x816810ec rev=0x01
hdr=0x00
vendor = 'Realtek Semiconductor'
device = 'Gigabit Ethernet NIC(NDIS 6.0) (RTL8168/8111/8111c)'
class  = network
subclass   = ethernet

rc.conf:
ifconfig_nfe0="inet 192.168.0.200 netmask 255.255.255.0"
defaultrouter="192.168.0.1"
ifconfig_re0="DHCP"

dmesg:
nfe0:  port 0xc800-0xc807 mem
0xfe02b000-0xfe02bfff irq 21 at device 20.0 on pci0
miibus1:  on nfe0
e1000phy0:  PHY 19 on miibus1
e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-FDX, auto
nfe0: Ethernet address: 00:1a:92:38:dc:95
nfe0: [FILTER]
re0:  port
0xac00-0xacff mem 0xfdbff000-0xfdbf irq 16 at device 0.0 on pci1
re0: Using 1 MSI messages
re0: Chip rev. 0x3800
re0: MAC rev. 0x
miibus0:  on re0
rgephy0:  PHY 1 on miibus0
rgephy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-FDX, auto
re0: Ethernet address: d8:5d:4c:80:b4:88
re0: [FILTER]
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Problem with re0

2010-11-11 Thread Pyun YongHyeon
On Thu, Nov 11, 2010 at 09:56:26PM +0100, Gabor Radnai wrote:
> Hi,
> 
> I have an Asus M2NPV-VM motherboard with integrated Nvidia MCP51 Gigabit
> Ethernet NIC and
> TP-Link TG-3468 PCIe network card which is using Realtek 8111 chip.
> 
> I have problem with the re driver: the Nvidia network interface is working
> properly but the other
> though it seems recognized by OS I cannot use. Sporadically it remains down
> and if it gets up then
> does not get ip address via DHCP nor help if I set static ip address. Can
> manipulate via ifconfig but
> unreachable via IP.
> 
> I replaced cable, interchanged cable working with Nvidia, restarted
> switch/router but no luck so far.
> Also using this nic in a Windows machine - it works. Using my Asus mob with
> Ubuntu Live CD - card works.
> 
> Can it be a driver bug or this type of chip is not supported by re driver?
> 

Eh, you already know the answer, recognized by re(4) but does not
work so it's a bug of re(4). Would you show me the output of
ifconfig re0 after UP the interface(i.e. ifconfig re0 up).
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: ML370 G4 with poor Network Performance and high CPU Load

2010-11-11 Thread Pyun YongHyeon
On Thu, Nov 11, 2010 at 07:35:32PM +, r...@reckschwardt.de wrote:
>  Hello,
> 
> i am new in this Maillist and i use an ML370G4 with FreeBSD 8.1 AMD64. I 
> try with netio and TCP. The used Nics are onboard Broadcom 
> (PCI-X133Mhz), an Broadcom PCI-X Nic and an intel PCI-X Nic. The CPU 
> load is around 35% and the performance like this:
> 
> Packet size  1k bytes:  99303 KByte/s Tx,  44576 KByte/s Rx.
> Packet size  2k bytes:  72043 KByte/s Tx,  75200 KByte/s Rx.
> Packet size  4k bytes:  23280 KByte/s Tx,  66072 KByte/s Rx.
> Packet size  8k bytes:  55234 KByte/s Tx,  64470 KByte/s Rx.
> Packet size 16k bytes:  82485 KByte/s Tx,  74099 KByte/s Rx.
> Packet size 32k bytes:  93133 KByte/s Tx,  74992 KByte/s Rx.
> 

And you did perform the test on idle system?(No disk activity, no
other network IOs etc).

Show me the dmesg output of verbose boot and output of "pciconf
-lcbv".

> I try the following tuning:
> 
> kern.ipc.maxsockbuf=16777216
> net.inet.tcp.sendbuf_max=16777216
> net.inet.tcp.recvbuf_max=16777216
> net.inet.tcp.sendbuf_inc=16384
> net.inet.tcp.recvbuf_inc=524288
> net.inet.tcp.inflight.enable=0
> net.inet.tcp.hostcache.expire=1
> 
> but this is not helpfull, the Load goes to 60% and the Performance is 
> also poor. How can i prevent this Problem?
> 
> thanks for response r?
> 
> 
> P.S. the same Computer with Linux runs perfect with Performance and 1-2% 
> Load,
> 
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: icmp packets on em larger than 1472 [SEC=UNCLASSIFIED]

2010-11-11 Thread Kevin Oberman
> From: Pyun YongHyeon 
> Date: Thu, 11 Nov 2010 13:04:36 -0800
> 
> On Thu, Nov 11, 2010 at 08:10:57AM -0800, Kevin Oberman wrote:
> > > Date: Wed, 10 Nov 2010 23:49:56 -0800 (PST)
> > > From: Kirill Yelizarov 
> > > 
> > > 
> > > 
> > > --- On Thu, 11/11/10, Kevin Oberman  wrote:
> > > 
> > > > From: Kevin Oberman 
> > > > Subject: Re: icmp packets on em larger than 1472 [SEC=UNCLASSIFIED]
> > > > To: "Wilkinson, Alex" 
> > > > Cc: freebsd-sta...@freebsd.org
> > > > Date: Thursday, November 11, 2010, 8:26 AM
> > > > > Date: Thu, 11 Nov 2010 13:01:26
> > > > +0800
> > > > > From: "Wilkinson, Alex" 
> > > > > Sender: owner-freebsd-sta...@freebsd.org
> > > > > 
> > > > > 
> > > > >? ???0n Wed, Nov 10, 2010 at
> > > > 04:21:12AM -0800, Kirill Yelizarov wrote: 
> > > > > 
> > > > >? ???>All my em cards running
> > > > 8.1 stable don't reply to icmp echo requests packets larger
> > > > than 1472 bytes.
> > > > >? ???>
> > > > >? ???>On stable 7.2 the same
> > > > hardware works as expected:
> > > > >? ???># ping -s 1500
> > > > 192.168.64.99
> > > > >? ???>PING 192.168.64.99
> > > > (192.168.64.99): 1500 data bytes
> > > > >? ???>1508 bytes from
> > > > 192.168.64.99: icmp_seq=0 ttl=63 time=1.249 ms
> > > > >? ???>1508 bytes from
> > > > 192.168.64.99: icmp_seq=1 ttl=63 time=1.158 ms
> > > > >? ???>
> > > > >? ???>Here is the dump on em
> > > > interface
> > > > >? ???>15:06:31.452043 IP
> > > > 192.168.66.65 > *: ICMP echo request, id 28729, seq
> > > > 5, length 1480
> > > > >? ???>15:06:31.452047 IP
> > > > 192.168.66.65 > : icmp
> > > > >? ???>15:06:31.452069 IP 
> > > > > 192.168.66.65: ICMP echo reply, id 28729, seq 5, length
> > > > 1480
> > > > >? ???>15:06:31.452071 IP ***
> > > > > 192.168.66.65: icmp
> > > > >? ???> 
> > > > >? ???>Same ping from same source
> > > > (it's a 8.1 stable with fxp interface) to em card running
> > > > 8.1 stable
> > > > >? ???>#pciconf -lv
> > > > >?
> > > > ???>e...@pci0:3:4:0:???
> > > > class=0x02 card=0x10798086 chip=0x10798086 rev=0x03
> > > > hdr=0x00
> > > > >? ???>? ? vendor?
> > > > ???= 'Intel Corporation'
> > > > >? ???>? ? device?
> > > > ???= 'Dual Port Gigabit Ethernet Controller
> > > > (82546EB)'
> > > > >? ???>? ? class?
> > > > ? ? = network
> > > > >? ???>? ?
> > > > subclass???= ethernet
> > > > >? ???>
> > > > >? ???># ping -s 1472
> > > > 192.168.64.200
> > > > >? ???>PING 192.168.64.200
> > > > (192.168.64.200): 1472 data bytes
> > > > >? ???>1480 bytes from
> > > > 192.168.64.200: icmp_seq=0 ttl=63 time=0.848 ms
> > > > >? ???>^C
> > > > >? ???>
> > > > >? ???># ping -s 1473
> > > > 192.168.64.200
> > > > >? ???>PING 192.168.64.200
> > > > (192.168.64.200): 1473 data bytes
> > > > >? ???>^C
> > > > >? ???>--- 192.168.64.200 ping
> > > > statistics ---
> > > > >? ???>4 packets transmitted, 0
> > > > packets received, 100.0% packet loss
> > > > > 
> > > > > works fine for me:
> > > > > 
> > > > > FreeBSD 8.1-STABLE #0 r213395
> > > > > 
> > > > > e...@pci0:0:25:0:class=0x02 card=0x3035103c
> > > > chip=0x10de8086 rev=0x02 hdr=0x00
> > > > >? ???vendor?
> > > > ???= 'Intel Corporation'
> > > > >? ???device?
> > > > ???= 'Intel Gigabit network connection
> > > > (82567LM-3 )'
> > > > >? ???class? ? ? =
> > > > network
> > > > >? ???subclass???=
> > > > ethernet
> > > > > 
> > > > > #ping -s 1473 host
> > > > > PING host(192.168.1.1): 1473 data bytes
> > > > > 1481 bytes from 192.168.1.1: icmp_seq=0 ttl=253
> > > > time=31.506 ms
> > > > > 1481 bytes from 192.168.1.1: icmp_seq=1 ttl=253
> > > > time=31.493 ms
> > > > > 1481 bytes from 192.168.1.1: icmp_seq=2 ttl=253
> > > > time=31.550 ms
> > > > > ^C
> > > > 
> > > > The reason the '-s 1500' worked was that the packets were
> > > > fragmented. If
> > > > I add the '-D' option, '-s 1473' fails on v7 and v8. Are
> > > > the V8 systems
> > > > where you see if failing without the '-D' on the same
> > > > network segment?
> > > > If not, it is likely that an intervening device is refusing
> > > > to fragment
> > > > the packet. (Some routers deliberately don't fragment ICMP
> > > > Echos Request
> > > > packets.) 
> > > 
> > > If i set -D -s 1473 sender side refuses to ping and that is
> > > correct. All mentioned above machines are behind the same router and
> > > switch. Same hardware running v7 is working while v8 is not. And i
> > > never saw such problems before.  Also correct me if i'm wrong but the
> > > dump shows that the packet arrived. I'll try driver from head and will
> > > post here results.
> > 
> > I did a bit more looking at this today and I see that something bogus is
> > going on and it MAY be the em driver.
> > 
> > I tried 1473 data byte pings without the DF flag. I then captured the
> > packets on both ends (where the sending system has a bge (Broadcom GE)
> > and the responding end has an em (Intel) card.
> > 
> > What I saw was the fragmented IP packets all being received by the
> > system with the em interface and an ICMP Echo Reply being sent back,
> > again 

Re: ML370 G4 with poor Network Performance and high CPU Load

2010-11-11 Thread r...@reckschwardt.de

 Hello YongHyeon,

yes, booth Test-Servers are in idle State, no Disk activity and no 
important Networktraffic.


the pciconf -lcbv for the Nics:

e...@pci0:7:1:0: class=0x02 card=0x00db0e11 chip=0x10108086 rev=0x01 
hdr=0x00

vendor = 'Intel Corporation'
device = 'Dual Port Gigabit Ethernet Controller (Copper) 
(82546EB)'

class  = network
subclass   = ethernet
bar   [10] = type Memory, range 64, base 0xfdfe, size 131072, 
enabled
bar   [18] = type Memory, range 64, base 0xfdf8, size 262144, 
enabled

bar   [20] = type I/O Port, range 32, base 0x6000, size 64, enabled
cap 01[dc] = powerspec 2  supports D0 D3  current D0
cap 07[e4] = PCI-X 64-bit supports 133MHz, 2048 burst read, 1 split 
transaction

cap 05[f0] = MSI supports 1 message, 64 bit
e...@pci0:7:1:1: class=0x02 card=0x00db0e11 chip=0x10108086 rev=0x01 
hdr=0x00

vendor = 'Intel Corporation'
device = 'Dual Port Gigabit Ethernet Controller (Copper) 
(82546EB)'

class  = network
subclass   = ethernet
bar   [10] = type Memory, range 64, base 0xfdf6, size 131072, 
enabled

bar   [20] = type I/O Port, range 32, base 0x6040, size 64, enabled
cap 01[dc] = powerspec 2  supports D0 D3  current D0
cap 07[e4] = PCI-X 64-bit supports 133MHz, 2048 burst read, 1 split 
transaction

cap 05[f0] = MSI supports 1 message, 64 bit

if you need more Info please ask me ;-)

thanks for your responce ré


On Thu, Nov 11, 2010 at 07:35:32PM +, r...@reckschwardt.de wrote:

  Hello,

i am new in this Maillist and i use an ML370G4 with FreeBSD 8.1 AMD64. I
try with netio and TCP. The used Nics are onboard Broadcom
(PCI-X133Mhz), an Broadcom PCI-X Nic and an intel PCI-X Nic. The CPU
load is around 35% and the performance like this:

Packet size  1k bytes:  99303 KByte/s Tx,  44576 KByte/s Rx.
Packet size  2k bytes:  72043 KByte/s Tx,  75200 KByte/s Rx.
Packet size  4k bytes:  23280 KByte/s Tx,  66072 KByte/s Rx.
Packet size  8k bytes:  55234 KByte/s Tx,  64470 KByte/s Rx.
Packet size 16k bytes:  82485 KByte/s Tx,  74099 KByte/s Rx.
Packet size 32k bytes:  93133 KByte/s Tx,  74992 KByte/s Rx.


And you did perform the test on idle system?(No disk activity, no
other network IOs etc).

Show me the dmesg output of verbose boot and output of "pciconf
-lcbv".


I try the following tuning:

kern.ipc.maxsockbuf=16777216
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.sendbuf_inc=16384
net.inet.tcp.recvbuf_inc=524288
net.inet.tcp.inflight.enable=0
net.inet.tcp.hostcache.expire=1

but this is not helpfull, the Load goes to 60% and the Performance is
also poor. How can i prevent this Problem?

thanks for response r?


P.S. the same Computer with Linux runs perfect with Performance and 1-2%
Load,


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"






Re: ML370 G4 with poor Network Performance and high CPU Load

2010-11-11 Thread r...@reckschwardt.de

 here is the pciconf for the onboard Nic

b...@pci0:7:3:0:class=0x02 card=0x00cb0e11 chip=0x16c714e4 
rev=0x10 hdr=0x00

vendor = 'Broadcom Corporation'
device = 'BCM5703A3 NetXtreme Gigabit Ethernet'
class  = network
subclass   = ethernet
bar   [10] = type Memory, range 64, base 0xfdef, size 65536, 
enabled
cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 1 split 
transaction

cap 01[48] = powerspec 2  supports D0 D3  current D0
cap 03[50] = VPD
cap 05[58] = MSI supports 8 messages, 64 bit

regards ré



Re: ML370 G4 with poor Network Performance and high CPU Load

2010-11-11 Thread Pyun YongHyeon
On Thu, Nov 11, 2010 at 10:44:31PM +, r...@reckschwardt.de wrote:
>  Hello YongHyeon,
> 
> yes, booth Test-Servers are in idle State, no Disk activity and no 
> important Networktraffic.
> 
> the pciconf -lcbv for the Nics:
> 
> e...@pci0:7:1:0: class=0x02 card=0x00db0e11 chip=0x10108086 rev=0x01 
> hdr=0x00
> vendor = 'Intel Corporation'
> device = 'Dual Port Gigabit Ethernet Controller (Copper) 
> (82546EB)'
> class  = network
> subclass   = ethernet
> bar   [10] = type Memory, range 64, base 0xfdfe, size 131072, 
> enabled
> bar   [18] = type Memory, range 64, base 0xfdf8, size 262144, 
> enabled
> bar   [20] = type I/O Port, range 32, base 0x6000, size 64, enabled
> cap 01[dc] = powerspec 2  supports D0 D3  current D0
> cap 07[e4] = PCI-X 64-bit supports 133MHz, 2048 burst read, 1 split 
> transaction
> cap 05[f0] = MSI supports 1 message, 64 bit
> e...@pci0:7:1:1: class=0x02 card=0x00db0e11 chip=0x10108086 rev=0x01 
> hdr=0x00
> vendor = 'Intel Corporation'
> device = 'Dual Port Gigabit Ethernet Controller (Copper) 
> (82546EB)'
> class  = network
> subclass   = ethernet
> bar   [10] = type Memory, range 64, base 0xfdf6, size 131072, 
> enabled
> bar   [20] = type I/O Port, range 32, base 0x6040, size 64, enabled
> cap 01[dc] = powerspec 2  supports D0 D3  current D0
> cap 07[e4] = PCI-X 64-bit supports 133MHz, 2048 burst read, 1 split 
> transaction
> cap 05[f0] = MSI supports 1 message, 64 bit
> 
> if you need more Info please ask me ;-)
> 

Hmmm, I don't see any Broadcom controllers here. If you see issues
on em(4), Jack can help you. Note, 82546EB is really old controller
and I also remember the performance was not great compared to PCIe
version.

> thanks for your responce r?
> 
> >On Thu, Nov 11, 2010 at 07:35:32PM +, r...@reckschwardt.de wrote:
> >>  Hello,
> >>
> >>i am new in this Maillist and i use an ML370G4 with FreeBSD 8.1 AMD64. I
> >>try with netio and TCP. The used Nics are onboard Broadcom
> >>(PCI-X133Mhz), an Broadcom PCI-X Nic and an intel PCI-X Nic. The CPU
> >>load is around 35% and the performance like this:
> >>
> >>Packet size  1k bytes:  99303 KByte/s Tx,  44576 KByte/s Rx.
> >>Packet size  2k bytes:  72043 KByte/s Tx,  75200 KByte/s Rx.
> >>Packet size  4k bytes:  23280 KByte/s Tx,  66072 KByte/s Rx.
> >>Packet size  8k bytes:  55234 KByte/s Tx,  64470 KByte/s Rx.
> >>Packet size 16k bytes:  82485 KByte/s Tx,  74099 KByte/s Rx.
> >>Packet size 32k bytes:  93133 KByte/s Tx,  74992 KByte/s Rx.
> >>
> >And you did perform the test on idle system?(No disk activity, no
> >other network IOs etc).
> >
> >Show me the dmesg output of verbose boot and output of "pciconf
> >-lcbv".
> >
> >>I try the following tuning:
> >>
> >>kern.ipc.maxsockbuf=16777216
> >>net.inet.tcp.sendbuf_max=16777216
> >>net.inet.tcp.recvbuf_max=16777216
> >>net.inet.tcp.sendbuf_inc=16384
> >>net.inet.tcp.recvbuf_inc=524288
> >>net.inet.tcp.inflight.enable=0
> >>net.inet.tcp.hostcache.expire=1
> >>
> >>but this is not helpfull, the Load goes to 60% and the Performance is
> >>also poor. How can i prevent this Problem?
> >>
> >>thanks for response r?
> >>
> >>
> >>P.S. the same Computer with Linux runs perfect with Performance and 1-2%
> >>Load,
> >>
> >___
> >freebsd-net@freebsd.org mailing list
> >http://lists.freebsd.org/mailman/listinfo/freebsd-net
> >To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
> >
> 
> 

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Updated ARP Queue Patch...

2010-11-11 Thread George Neville-Neil
Howdy,

After some excellent comments from Bjoern I've put together the following patch:

http://people.freebsd.org/~gnn/head-arpqueue4.diff

Please review and comment.

Best,
George

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: NFS + FreeBSD TCP Behavior with Linux NAT

2010-11-11 Thread Lawrence Stewart
On 11/12/10 07:39, Julian Elischer wrote:
> On 11/11/10 6:36 AM, Christopher Penney wrote:
>> Hi,
>>
>> I have a curious problem I'm hoping someone can help with or at least
>> educate me on.
>>
>> I have several large Linux clusters and for each one we hide the compute
>> nodes behind a head node using NAT.  Historically, this has worked
>> very well
>> for us and any time a NAT gateway (the head node) reboots everything
>> recovers within a minute or two of it coming back up.  This includes NFS
>> mounts from Linux and Solaris NFS servers, license server connections,
>> etc.
>>
>> Recently, we added a FreeBSD based NFS server to our cluster resources
>> and
>> have had significant issues with NFS mounts hanging if the head node
>> reboots.  We don't have this happen much, but it does occasionally
>> happen.
>>   I've explored this and it seems the behavior of FreeBSD differs a
>> bit from
>> at least Linux and Solaris with respect to TCP recovery.  I'm curious if
>> someone can explain this or offer any workarounds.
>>
>> Here are some specifics from a test I ran:
>>
>> Before the reboot two Linux clients were mounting the FreeBSD server. 
>> They
>> were both using port 903 locally.  On the head node clientA:903 was
>> remapped
>> to headnode:903 and clientB:903 was remapped to headnode:601.  There
>> is no
>> activity when the reboot occurs.  The head node takes a few minutes to
>> come
>> back up (we kept it down for several minutes).
>>
>> When it comes back up clientA and clientB try to reconnect to the FreeBSD
>> NFS server.  They both use the same source port, but since the head
>> node's
>> conntrack table is cleared it's a race to see who gets what port and this
>> time clientA:903 appears as headnode:601 and clientB:903 appears as
>> headnode:903 (>>>  they essentially switch places as far as the FreeBSD
>> server would see<<<  ).
>>
>> The FreeBSD NFS server, since there was no outstanding acks it was
>> waiting
>> on, thinks things are ok so when it gets a SYN from the two clients it
>> only
>> responds with an ACK.  The ACK for each that it replies with is bogus
>> (invalid seq number) because it's using the return path the other
>> client was
>> using before the reboot so the client sends a RST back, but it never
>> gets to
>> the FreeBSD system since the head node's NAT hasn't yet seen the full
>> handshake (that would allow return packets).  The end result is a
>> "permanent" hang (at least until it would otherwise cleanup idle TCP
>> connections).
>>
>> This is in stark contrast to the behavior of the other systems we have.
>>   Other systems respond to the SYN used to reconnect with a SYN/ACK. 
>> They
>> appear to implicitly tear down the return path based on getting a SYN
>> from a
>> seemingly already established connection.
>>
>> I'm assuming this is one of the grey areas where there is no specific
>> behavior outlined in an RFC?  Is there any way to make the FreeBSD system
>> more reliable in this situation (like making it implicitly tear down the
>> return)?  Or is there a way to adjust the NAT setup to allow the RST to
>> return to the FreeBSD system?  Currently, NAT is setup with simply:
>>
>> iptables -t nat -A POSTROUTING -s 10.1.0.0/16 -o bond0 -j SNAT --to
>> 1.2.3.4
>>
>> Where 1.2.3.4 is the intranet address and 10.1.0.0 is the cluster
>> network.
> 
> I just added NFS to the subject because the NFS people are thise you
> need to
> connect with.

Skimming Chris' problem description, I don't think I agree that this is
an NFS issue and agree with Chris that it's netstack related behaviour
as opposed to application related.

Chris, I have minimal cycles at the moment and your scenario is bending
my brain a little bit too much to give a quick response. A tcpdump
excerpt showing such an exchange would be very useful. I'll try come
back to it when I I have a sec. Andre, do you have a few cycles to
digest this in more detail?

Cheers,
Lawrence
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: NFS + FreeBSD TCP Behavior with Linux NAT

2010-11-11 Thread Andre Oppermann

On 12.11.2010 03:29, Lawrence Stewart wrote:

On 11/12/10 07:39, Julian Elischer wrote:

On 11/11/10 6:36 AM, Christopher Penney wrote:

Hi,

I have a curious problem I'm hoping someone can help with or at least
educate me on.

I have several large Linux clusters and for each one we hide the compute
nodes behind a head node using NAT.  Historically, this has worked
very well
for us and any time a NAT gateway (the head node) reboots everything
recovers within a minute or two of it coming back up.  This includes NFS
mounts from Linux and Solaris NFS servers, license server connections,
etc.

Recently, we added a FreeBSD based NFS server to our cluster resources
and
have had significant issues with NFS mounts hanging if the head node
reboots.  We don't have this happen much, but it does occasionally
happen.
   I've explored this and it seems the behavior of FreeBSD differs a
bit from
at least Linux and Solaris with respect to TCP recovery.  I'm curious if
someone can explain this or offer any workarounds.

Here are some specifics from a test I ran:

Before the reboot two Linux clients were mounting the FreeBSD server.
They
were both using port 903 locally.  On the head node clientA:903 was
remapped
to headnode:903 and clientB:903 was remapped to headnode:601.  There
is no
activity when the reboot occurs.  The head node takes a few minutes to
come
back up (we kept it down for several minutes).

When it comes back up clientA and clientB try to reconnect to the FreeBSD
NFS server.  They both use the same source port, but since the head
node's
conntrack table is cleared it's a race to see who gets what port and this
time clientA:903 appears as headnode:601 and clientB:903 appears as
headnode:903 (>>>   they essentially switch places as far as the FreeBSD
server would see<<<   ).

The FreeBSD NFS server, since there was no outstanding acks it was
waiting
on, thinks things are ok so when it gets a SYN from the two clients it
only
responds with an ACK.  The ACK for each that it replies with is bogus
(invalid seq number) because it's using the return path the other
client was
using before the reboot so the client sends a RST back, but it never
gets to
the FreeBSD system since the head node's NAT hasn't yet seen the full
handshake (that would allow return packets).  The end result is a
"permanent" hang (at least until it would otherwise cleanup idle TCP
connections).

This is in stark contrast to the behavior of the other systems we have.
   Other systems respond to the SYN used to reconnect with a SYN/ACK.
They
appear to implicitly tear down the return path based on getting a SYN
from a
seemingly already established connection.

I'm assuming this is one of the grey areas where there is no specific
behavior outlined in an RFC?  Is there any way to make the FreeBSD system
more reliable in this situation (like making it implicitly tear down the
return)?  Or is there a way to adjust the NAT setup to allow the RST to
return to the FreeBSD system?  Currently, NAT is setup with simply:

iptables -t nat -A POSTROUTING -s 10.1.0.0/16 -o bond0 -j SNAT --to
1.2.3.4

Where 1.2.3.4 is the intranet address and 10.1.0.0 is the cluster
network.


I just added NFS to the subject because the NFS people are thise you
need to
connect with.


Skimming Chris' problem description, I don't think I agree that this is
an NFS issue and agree with Chris that it's netstack related behaviour
as opposed to application related.

Chris, I have minimal cycles at the moment and your scenario is bending
my brain a little bit too much to give a quick response. A tcpdump
excerpt showing such an exchange would be very useful. I'll try come
back to it when I I have a sec. Andre, do you have a few cycles to
digest this in more detail?


I had very few cycles since EuroBSDCon as well but this weekend my
little son has a sleep over at my mother in law and my wife is at
work.  So I'm going to reduce my FreeBSD backlog.  There are a few
things that have queued up.  I should get enough time to take care
of this one.

--
Andre
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Problem with re0

2010-11-11 Thread Zeus V Panchenko
Hi,

Gabor Radnai (gabor.rad...@gmail.com) [10.11.11 23:22] wrote:
> pciconf:
> n...@pci0:0:20:0:class=0x068000 card=0x816a1043 chip=0x026910de rev=0xa3
> hdr=0x00
> vendor = 'NVIDIA Corporation'
> device = 'MCP51 Network Bus Enumerator'
> class  = bridge
> r...@pci0:1:0:0:class=0x02 card=0x816810ec chip=0x816810ec rev=0x01
> hdr=0x00
> vendor = 'Realtek Semiconductor'
> device = 'Gigabit Ethernet NIC(NDIS 6.0) (RTL8168/8111/8111c)'
> class  = network
> subclass   = ethernet
> 

i have the same problem (i was writing here before, but still no idea)
with onboard (Realtek Gigabit Ethernet NIC(NDIS 6.0)
(RTL8168/8111/8111c)) nic while the same driver but another vendor
(D-Link DGE-528T Gigabit adaptor (dlg10086)) nic works fine ...

the flapping of the realtek interface is so much drastic, that i was
forced to unplug the cable even on the ip less nic

i was sure it is the problem of the onboard rt nics ...

uname:
FreeBSD 8.1-STABLE #3 amd64

pciconf -lv:
r...@pci0:2:0:0: class=0x02 card=0x83a31043 chip=0x816810ec rev=0x03 
hdr=0x00
vendor = 'Realtek Semiconductor'
device = 'Gigabit Ethernet NIC(NDIS 6.0) (RTL8168/8111/8111c)'
class  = network
subclass   = ethernet
r...@pci0:1:0:0: class=0x02 card=0x43001186 chip=0x43001186 rev=0x10 
hdr=0x00
vendor = 'D-Link System Inc'
device = 'Used on DGE-528T Gigabit adaptor (dlg10086)'
class  = network
subclass   = ethernet

dmidecode:
Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
Manufacturer: ASUSTeK Computer INC.
Product Name: AT5NM10-I
Version: Rev x.0x
Serial Number: MT7006K15200628
Asset Tag: To Be Filled By O.E.M.
Features:
Board is a hosting board
Board is replaceable
Location In Chassis: To Be Filled By O.E.M.
Chassis Handle: 0x0003
Type: Motherboard
Contained Object Handles: 0

Handle 0x0012, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: LAN
Internal Connector Type: None
External Reference Designator: LAN
External Connector Type: RJ-45
Port Type: Network Port

-- 
Zeus V. Panchenko
IT Dpt., IBS ltdGMT+2 (EET)
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"