Re: Removal of deprecated implied connect for TCP

2010-09-13 Thread Andre Oppermann

Based on the feedback I withdraw the proposal to remove implied connect
from TCP.  Instead I will look at it closer and fix any loose ends that
may have come from other changes in the TCP code.

Many good points have been raised and I will repeat them here again for
the archives:

 o In FreeBSD most, if not all, protocols support implied connects,
   removing it from TCP would make it an outlier.
 o It is being used by at least one product based on FreeBSD.
 o It can speed up the data sending phase by sending data on the
   ACK after SYN-ACK. [In RFC1644 it already sent data on the
   initial SYN but no one is accepting that anymore.]

It is important to note, though, that implied connect in TCP is
non-standard and no other even remotely popular OS supports it.
Thus any applications making use of it are non-portable.

--
Andre

On 11.09.2010 17:38, Randall Stewart wrote:

All:

One thing to note.. when you can do an implied connection setup, the
3-way hand shake has the potential to carry data (don't know if tcp does in 
FreeBSD)
on the third leg of the 3-way handshake.

This is one of the reasons SCTP uses this.. since we often will
carry data an the third and even possibly the 4th leg of the handshake (we
have one extra leg).

Taking this feature out of TCP will make it so we will be like all other
o/s's and the socket semantic will prevent you from doing data on the third 
leg..

SYN>


instead of

SYN-->
<---SYN-ACk--
---ACK+DATA-->


In the past I have mentioned in classes I teach that TCP is capable of this but 
the O/S's
of the world do not allow this later behavior..

Just thoughts and ramblings ;-)

R


On Sep 10, 2010, at 2:51 PM, Karim Fodil-Lemelin wrote:


On 31/08/2010 5:32 PM, Robert Watson wrote:


On Tue, 31 Aug 2010, Andre Oppermann wrote:


I'm not entirely comfortable with this change, and would like a chance to 
cogitate on it a bit
more. While I'm not aware of any applications depending on the semantic for 
TCP, I know that we
do use it for UNIX domain sockets.


I don't have any plans to remove the implied connect support from the socket 
layer or other
protocols, only from TCP.


Right -- the implicit question is: why should TCP be the only stream protocol 
in our stack *not*
to support implied connection, when we plan to continue to support it for all 
other protocols?


For deprecating this part of the TCP API there is no documentation to the 
implied connect in
tcp(4). In sendto(2) it doesn't differentiate between protocols and simply says: 
"... sendto()
and sendmsg() may be used at any time." For MSG_EOF it says that is only 
supported for
SOCK_STREAM sockets in the PF_INET protocol family. These sentences have to be 
corrected.


In general, deprecating is taken to mean providing significant and explicit 
advance warning of
removal -- for example, updating the 8.x man page to point out that the feature 
is deprecated and
it will not appear in future releases of FreeBSD.

Robert

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Hi,

For what its worth, we at Xiphos (now XipLink), are still using sendto and 
T/TCP and is one of the
reasons we've chosen FreeBSD more then 10 years ago!

Best regards,

Karim.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"



--
Randall Stewart
803-317-4952 (cell)

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Removal of deprecated implied connect for TCP

2010-09-13 Thread Andre Oppermann

On 11.09.2010 17:38, Randall Stewart wrote:

All:

One thing to note.. when you can do an implied connection setup, the
3-way hand shake has the potential to carry data (don't know if tcp does in 
FreeBSD)
on the third leg of the 3-way handshake.

This is one of the reasons SCTP uses this.. since we often will
carry data an the third and even possibly the 4th leg of the handshake (we
have one extra leg).

Taking this feature out of TCP will make it so we will be like all other
o/s's and the socket semantic will prevent you from doing data on the third 
leg..

SYN>


instead of

SYN-->
<---SYN-ACk--
---ACK+DATA-->


In the past I have mentioned in classes I teach that TCP is capable of this but 
the O/S's
of the world do not allow this later behavior..


The savings in TCP for the case you describe here are not that great.
With piggy-backing data on the third leg you save one (small) ACK packet
and one round-trip to userspace for the application to start sending data.
There is no need to wait for a full network round trip time to start
sending data.

The real savings from implied connected with RFC1644 came from sending
data together with initial SYN.  The receiving side would either queue
the data until the 3WHS was complete, or upon later invocations directly
create a socket upon SYN.  The trouble with the way of doing it in RFC1644
was the very weak protection against very simple DoS attacks.  Only a
connection count variable was used to prevent fake SYN's from opening
new sockets.  This plus the required socket layer changes (the implied
connect) caused a quick halt if any further RFC1644 adoption.

--
Andre
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Current problem reports assigned to freebsd-net@FreeBSD.org

2010-09-13 Thread FreeBSD bugmaster
Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker  Resp.  Description

o kern/150257  net[msk] watchdog timeout
o kern/150251  net[patch] [ixgbe] Late cable insertion broken
o kern/150249  net[ixgbe] Media type detection broken
o kern/150247  net[patch] [ixgbe] Version in -current won't build on 7.x
o bin/150224   netppp does not reassign static IP after kill -KILL comma
o kern/150148  net[ath] Atheros 5424/2424 - AR2425 stopped working with 
o kern/150052  netwi(4) driver does not work with wlan(4) driver for Luc
f kern/149969  net[wlan] [ral] ralink rt2661 fails to maintain connectio
o kern/149937  net[ipfilter] [patch] kernel panic in ipfilter IP fragmen
o kern/149804  net[icmp] [panic] ICMP redirect on causes "panic: rtqkill
o kern/149786  net[bwn] bwn on Dell Inspiron 1150: connections stall
o kern/149643  net[rum] device not sending proper beacon frames in ap mo
o kern/149609  net[panic] reboot after adding second default route
o kern/149539  net[ath] atheros ar9287 is not supported by ath_hal
o kern/149516  net[ath] ath(4) hostap with fake MAC/BSSID results in sta
o kern/149373  net[realtek/atheros]: None of my network card working
o kern/149307  net[ath] Doesn't work Atheros 9285
o kern/149306  net[alc] Doesn't work Atheros AR8131 PCIe Gigabit Etherne
o kern/149117  net[inet] [patch] in_pcbbind: redundant test
o kern/149086  net[multicast] Generic multicast join failure in 8.1
o kern/148862  net[panic] page fault while in kernel mode at _mtx_lock_s
o kern/148322  net[ath] Triggering atheros wifi beacon misses in hostap 
o kern/148317  net[ath] FreeBSD 7.x hostap memory leak in net80211 or At
o kern/148078  net[ath] wireless networking stops functioning
o kern/147985  net[alc] alc network driver + tso ( + vlan ? ) does not w
o kern/147894  net[ipsec] IPv6-in-IPv4 does not work inside an ESP-only 
o kern/147862  net[wpi] Possible bug in the wpi driver.  Network Manager
o kern/147155  net[ip6] setfb not work with ipv6
o kern/146909  net[rue] rue(4) does not detect OQO model01 network contr
o kern/146845  net[libc] close(2) returns error 54 (connection reset by 
o kern/146792  net[flowtable] flowcleaner 100% cpu's core load
o kern/146759  net[cxgb] [patch] cxgb panic calling cxgb_set_lro() witho
o kern/146719  net[pf] [panic] PF or dumynet kernel panic
o kern/146534  net[icmp6] wrong source address in echo reply
o kern/146517  net[ath] [wlan] device timeouts for ath wlan device on re
o kern/146427  net[mwl] Additional virtual access points don't work on m
o kern/146426  net[mwl] 802.11n rates not possible on mwl
o kern/146425  net[mwl] mwl dropping all packets during and after high u
f kern/146394  net[vlan] IP source address for outgoing connections
o bin/146377   net[ppp] [tun] Interface doesn't clear addresses when PPP
o kern/146358  net[vlan] wrong destination MAC address
o kern/146165  net[wlan] [panic] Setting bssid in adhoc mode causes pani
o kern/146082  net[ng_l2tp] a false invaliant check was performed in ng_
o kern/146037  net[panic] mpd + CoA = kernel panic
o bin/145934   net[patch] add count option to netstat(1)
o kern/145826  net[ath] Unable to configure adhoc mode on ath0/wlan0
o kern/145825  net[panic] panic: soabort: so_count
o kern/145777  net[wpi] Intel 3945ABG driver breaks the connection after
o kern/145728  net[lagg] Stops working lagg between two servers.
o amd64/145654 netamd64-curent memory leak in kernel
o kern/144987  net[wpi] [panic] injecting packets with wlaninject using 
o kern/144882  netMacBookPro =>4.1 does not connect to BSD in hostap wit
o kern/144874  net[if_bridge] [patch] if_bridge frees mbuf after pfil ho
o conf/144700  net[rc.d] async dhclient breaks stuff for too many people
o kern/144642  net[rum] [panic] Enabling rum interface causes panic
o kern/144616  net[nat] [panic] ip_nat panic FreeBSD 7.2
o kern/144572  net[carp] CARP preemption mode traffic partially goes to 
f kern/144315  net[ipfw] [panic] freebsd 8-stable reboot after add ipfw 
o kern/143939  net[ipfw] [em] ipfw nat and em interface rxcsum problem
o kern/143874  net[wpi] Wireless 3945ABG error. wpi0 could not allocate 
o kern/143868  net[ath] [patch] [request] allow Atheros watchdog timeout

TCP loopback socket fusing

2010-09-13 Thread Andre Oppermann

When a TCP connection via loopback back to localhost is made the whole
send, segmentation and receive path (with larger packets though) is still
executed.  This has some considerable overhead.

To short-circuit the send and receive sockets on localhost TCP connections
I've made a proof-of-concept patch that directly places the data in the
other side's socket buffer without doing any packetization and other protocol
overhead (like UNIX domain sockets).  The connections setup (SYN, SYN-ACK,
ACK) and shutdown are still handled by normal TCP segments via loopback so
that firewalling stills works.  The actual payload data during the session
won't be seen and the sequence numbers don't move other than for SYN and FIN.
The sequence are remain valid though.  Obviously tcpdump won't see any data
transfers either if the connection has fused sockets.

Preliminary testing (with WITNESS and INVARIANTS enabled) has shown stable
operation and a rough doubling of the throughput on loopback connections.
I've tested most socket teardown cases and it behaves fine.  I'm not entirely
sure I've got all possible path's but the way it is integrated should properly
defuse the sockets in all situations.

Testers and feedback wanted:

 http://people.freebsd.org/~andre/tcp_loopfuse-20100913.diff

--
Andre

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Removal of deprecated implied connect for TCP

2010-09-13 Thread Lars Eggert
Hi,

On 2010-8-29, at 16:22, Andre Oppermann wrote:
> T/TCP was ill-defined and had major security issues and never gained
> any support. It has been defunct in FreeBSD and most code has been
> removed about 6 years ago.

we're also about to declare the T/TCP RFCs Historic. See 
http://tools.ietf.org/html/draft-eggert-tcpm-historicize (which is a work item 
in the TCPM working group despite not being a draft-ietf-... at the moment.)

Lars___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: TCP loopback socket fusing

2010-09-13 Thread Andre Oppermann

On 13.09.2010 14:45, Poul-Henning Kamp wrote:

In message<4c8e0c1e.2020...@networx.ch>, Andre Oppermann writes:


To short-circuit the send and receive sockets on localhost TCP connections
I've made a proof-of-concept patch that directly places the data in the
other side's socket buffer without doing any packetization and other protocol
overhead [...]


Can we keep the option (sysctl ?) of doing the full packet thing, it is
a very convenient debugging tool...


Yes, an appropriate sysctl is already contained in the patch (w/o man
page documentation yet).

--
Andre
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: TCP loopback socket fusing

2010-09-13 Thread Poul-Henning Kamp
In message <4c8e0c1e.2020...@networx.ch>, Andre Oppermann writes:

>To short-circuit the send and receive sockets on localhost TCP connections
>I've made a proof-of-concept patch that directly places the data in the
>other side's socket buffer without doing any packetization and other protocol
>overhead [...]

Can we keep the option (sysctl ?) of doing the full packet thing, it is
a very convenient debugging tool...

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


What about net.isr ?

2010-09-13 Thread Marcos Vinícius Buzo
Hi all.

I have a dual Intel Xeon E5506 box running mpd5, dummynet and pf. Sometimes
i get about 500+ pppoe connections to this machine, the network traffic goes
to 30mbps and CPU usage hits 100%. I would like to know if netisr would help
me using the other processor cores, and where I can get docs about it.
My network card is a dual port Broadcom NetXtreme II BCM5709.

Thanks in advance
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: bge hangs on recent 7.3-STABLE

2010-09-13 Thread Igor Sysoev
On Fri, Sep 10, 2010 at 07:39:15AM +0400, Igor Sysoev wrote:

> On Thu, Sep 09, 2010 at 01:10:50PM -0700, Pyun YongHyeon wrote:
> 
> > On Thu, Sep 09, 2010 at 02:28:26PM +0400, Igor Sysoev wrote:
> > > Hi,
> > > 
> > > I have several hosts running FreeBSD/amd64 7.2-STABLE updated on 
> > > 11.01.2010
> > > and 25.02.2010. Hosts process about 10K input and 10K output packets/s
> > > without issues. One of them, however, is loaded more than others, so it
> > > processes 20K/20K packets/s.
> > > 
> > > Recently, I have upgraded one host to 7.3-STABLE, 24.08.2010.
> > > Then bge on this host hung two times. I was able to restart it from
> > > console using:
> > >   /etc/rc.d/netif restart bge0
> > > 
> > > Then I have upgraded the most loaded (20K/20K) host to 7.3-STABLE, 
> > > 07.09.2010.
> > > After reboot bge hung every several seconds. I was able to restart it,
> > > but bge hung again after several seconds.
> > > 
> > > Then I have downgraded this host to 7.3-STABLE, 14.08.2010, since there
> > > were several if_bge.c commits on 15.08.2010. The same hangs.
> > > Then I have downgraded this host to 7.3-STABLE, 17.03.2010, before
> > > the first if_bge.c commit after 25.02.2010. Now it runs without hangs.
> > > 
> > > The hosts are amd64 dual core SMP with 4G machines. bge information:
> > > 
> > > b...@pci0:4:0:0:class=0x02 card=0x165914e4 chip=0x165914e4 
> > > rev=0x11 hdr=0x00
> > > vendor = 'Broadcom Corporation'
> > > device = 'NetXtreme Gigabit Ethernet PCI Express (BCM5721)'
> > > 
> > > bge0:  > > 0x004101> mem 0xfe5f-0xfe5f irq 19 at device 0.0 on pci4
> > > miibus1:  on bge0
> > > brgphy0:  PHY 1 on miibus1
> > > brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> > > 1000baseT-FDX, auto
> > > bge0: Ethernet address: 00:e0:81:5f:6e:8a
> > > 
> > 
> > Could you show me verbose boot message(bge part only)?
> > Also show me the output of "pciconf -lcbv".
> 
> Here is "pciconf -lcbv", I will send the "boot -v" part later.
> 
> b...@pci0:4:0:0:  class=0x02 card=0x165914e4 chip=0x165914e4 rev=0x11 
> hdr=0x00
> vendor = 'Broadcom Corporation'
> device = 'NetXtreme Gigabit Ethernet PCI Express (BCM5721)'
> class  = network
> subclass   = ethernet
> bar   [10] = type Memory, range 64, base 0xfe5f, size 65536, enabled
> cap 01[48] = powerspec 2  supports D0 D3  current D0
> cap 03[50] = VPD
> cap 05[58] = MSI supports 8 messages, 64 bit 
> cap 10[d0] = PCI-Express 1 endpoint max data 128(128) link x1(x1)

Sorry for delay. Here is "boot -v" part. It is from other host, but
the host hungs too:

pci4:  on pcib4
pci4: domain=0, physical bus=4
found-> vendor=0x14e4, dev=0x1659, revid=0x11
domain=0, bus=4, slot=0, func=0
class=02-00-00, hdrtype=0x00, mfdev=0
cmdreg=0x0006, statreg=0x0010, cachelnsz=8 (dwords)
lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
intpin=a, irq=5
powerspec 2  supports D0 D3  current D0
MSI supports 8 messages, 64 bit
map[10]: type Memory, range 64, base 0xfe5f, size 16, enabled
pcib4: requested memory range 0xfe5f-0xfe5f: good
pcib0: matched entry for 0.13.INTA (src \_SB_.PCI0.APC4:0)
pcib0: slot 13 INTA routed to irq 19 via \_SB_.PCI0.APC4
pcib4: slot 0 INTA is routed to irq 19
pci0:4:0:0: bad VPD cksum, remain 14
bge0:  mem 0
xfe5f-0xfe5f irq 19 at device 0.0 on pci4
bge0: Reserved 0x1 bytes for rid 0x10 type 3 at 0xfe5f
bge0: CHIP ID 0x4101; ASIC REV 0x04; CHIP REV 0x41; PCI-E
miibus1:  on bge0
brgphy0:  PHY 1 on miibus1
brgphy0: OUI 0x000818, model 0x0018, rev. 0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto
bge0: bpf attached
bge0: Ethernet address: 00:e0:81:5c:64:85
ioapic0: routing intpin 19 (PCI IRQ 19) to vector 54
bge0: [MPSAFE]
bge0: [ITHREAD]


-- 
Igor Sysoev
http://sysoev.ru/en/
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: bge hangs on recent 7.3-STABLE

2010-09-13 Thread Igor Sysoev
On Thu, Sep 09, 2010 at 02:18:08PM -0700, Pyun YongHyeon wrote:

> On Thu, Sep 09, 2010 at 01:10:50PM -0700, Pyun YongHyeon wrote:
> > On Thu, Sep 09, 2010 at 02:28:26PM +0400, Igor Sysoev wrote:
> > > Hi,
> > > 
> > > I have several hosts running FreeBSD/amd64 7.2-STABLE updated on 
> > > 11.01.2010
> > > and 25.02.2010. Hosts process about 10K input and 10K output packets/s
> > > without issues. One of them, however, is loaded more than others, so it
> > > processes 20K/20K packets/s.
> > > 
> > > Recently, I have upgraded one host to 7.3-STABLE, 24.08.2010.
> > > Then bge on this host hung two times. I was able to restart it from
> > > console using:
> > >   /etc/rc.d/netif restart bge0
> > > 
> > > Then I have upgraded the most loaded (20K/20K) host to 7.3-STABLE, 
> > > 07.09.2010.
> > > After reboot bge hung every several seconds. I was able to restart it,
> > > but bge hung again after several seconds.
> > > 
> > > Then I have downgraded this host to 7.3-STABLE, 14.08.2010, since there
> > > were several if_bge.c commits on 15.08.2010. The same hangs.
> > > Then I have downgraded this host to 7.3-STABLE, 17.03.2010, before
> > > the first if_bge.c commit after 25.02.2010. Now it runs without hangs.
> > > 
> > > The hosts are amd64 dual core SMP with 4G machines. bge information:
> > > 
> > > b...@pci0:4:0:0:class=0x02 card=0x165914e4 chip=0x165914e4 
> > > rev=0x11 hdr=0x00
> > > vendor = 'Broadcom Corporation'
> > > device = 'NetXtreme Gigabit Ethernet PCI Express (BCM5721)'
> > > 
> > > bge0:  > > 0x004101> mem 0xfe5f-0xfe5f irq 19 at device 0.0 on pci4
> > > miibus1:  on bge0
> > > brgphy0:  PHY 1 on miibus1
> > > brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> > > 1000baseT-FDX, auto
> > > bge0: Ethernet address: 00:e0:81:5f:6e:8a
> > > 
> > 
> > Could you show me verbose boot message(bge part only)?
> > Also show me the output of "pciconf -lcbv".
> > 
> 
> Forgot to send a patch. Let me know whether attached patch fixes
> the issue or not.

> Index: sys/dev/bge/if_bge.c
> ===
> --- sys/dev/bge/if_bge.c  (revision 212341)
> +++ sys/dev/bge/if_bge.c  (working copy)
> @@ -3386,9 +3386,11 @@
>   sc->bge_rx_saved_considx = rx_cons;
>   bge_writembx(sc, BGE_MBX_RX_CONS0_LO, sc->bge_rx_saved_considx);
>   if (stdcnt)
> - bge_writembx(sc, BGE_MBX_RX_STD_PROD_LO, sc->bge_std);
> + bge_writembx(sc, BGE_MBX_RX_STD_PROD_LO, (sc->bge_std +
> + BGE_STD_RX_RING_CNT - 1) % BGE_STD_RX_RING_CNT);
>   if (jumbocnt)
> - bge_writembx(sc, BGE_MBX_RX_JUMBO_PROD_LO, sc->bge_jumbo);
> + bge_writembx(sc, BGE_MBX_RX_JUMBO_PROD_LO, (sc->bge_jumbo +
> + BGE_JUMBO_RX_RING_CNT - 1) % BGE_JUMBO_RX_RING_CNT);
>  #ifdef notyet
>   /*
>* This register wraps very quickly under heavy packet drops.

Thank you, it seems the patch has fixed the bug.
BTW, I noticed the same hungs on FreeBSD 8.1, date=2010.09.06.23.59.59
I will apply the patch on all my updated hosts.


-- 
Igor Sysoev
http://sysoev.ru/en/
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: bce(4) - com_no_buffers (Again)

2010-09-13 Thread Tom Judge
On 09/09/2010 07:24 PM, Pyun YongHyeon wrote:
> On Thu, Sep 09, 2010 at 03:58:30PM -0500, Tom Judge wrote:
>   
>> Hi,
>> I am just following up on the thread from March (I think) about this issue.
>>
>> We are seeing this issue on a number of systems running 7.1. 
>>
>> The systems in question are all Dell:
>>
>> * R710 R610 R410
>> * PE2950
>>
>> The latter do not show the issue as much as the R series systems.
>>
>> The cards in one of the R610's that I am testing with are:
>>
>> b...@pci0:1:0:0:class=0x02 card=0x02361028 chip=0x163914e4
>> rev=0x20 hdr=0x00
>> vendor = 'Broadcom Corporation'
>> device = 'NetXtreme II BCM5709 Gigabit Ethernet'
>> class  = network
>> subclass   = ethernet
>>
>> They are connected to Dell PowerConnect 5424 switches.
>>
>> uname -a:
>> FreeBSD bandor.chi-dc.mintel.ad 7.1-RELEASE-p4 FreeBSD 7.1-RELEASE-p4
>> #3: Wed Sep  8 08:19:03 UTC 2010
>> t...@dev-tj-7-1-amd64.chicago.mintel.ad:/usr/obj/usr/src/sys/MINTELv10  amd64
>>
>> We are also using 8192 byte jumbo frames, if_lagg and if_vlan in the
>> configuration (the nics are in promisc as we are currently capturing
>> netflow data on another vlan for diagnostic purposes. ):
>>
>>
>> 

>> I have updated the bce driver and the Broadcomm MII driver to the
>> version from stable/7 and am still seeing the issue.
>>
>> This morning I did a test with increasing the RX_PAGES to 8 but the
>> system just hung starting the network.  The route command got stuck in a
>> zone state (Sorry can't remember exactly which).
>>
>> The real question is, how do we go about increasing the number of RX
>> BDs? I guess we have to bump more that just RX_PAGES...
>>
>>
>> The cause for us, from what we can see, is the openldap server sending
>> large group search results back to nss_ldap or pam_ldap.  When it does
>> this it seems to send each of the 600 results in its own TCP segment
>> creating a small packet storm (600*~100byte PDU's) at the destination
>> host.  The kernel then retransmits 2 blocks of 100 results each after
>> SACK kicks in for the data that was dropped by the NIC.
>>
>>
>> Thanks in advance
>>
>> Tom
>>
>>
>> 

> FW may drop incoming frames when it does not see available RX
> buffers. Increasing number of RX buffers slightly reduce the
> possibility of dropping frames but it wouldn't completely fix it.
> Alternatively driver may tell available RX buffers in the middle
> of RX ring processing instead of giving updated buffers at the end
> of RX processing. This way FW may see available RX buffers while
> driver/upper stack is busy to process received frames. But this may
> introduce coherency issues because the RX ring is shared between
> host and FW. If FreeBSD has way to sync partial region of a DMA
> map, this could be implemented without fear of coherency issue.
> Another way to improve RX performance would be switching to
> multi-RX queue with RSS but that would require a lot of work and I
> had no time to implement it.
>   

Does this mean that these cards are going to perform badly? This is was
what I gathered from the previous thread.

> BTW, given that you've updated to bce(4)/mii(4) of stable/7, I
> wonder why TX/RX flow controls were not kicked in.
>   

The working copy I used for grabbing the upstream source is at r212371.

Last changes for the directories in my working copy:

sys/dev/bce @  211388
sys/dev/mii @ 212020


I discovered that flow control was disabled on the switches, so I set it
to auto and added a pair of BCE_PRINTF's in the code where it enables
and disables flow control and now it gets enabled.


Without BCE_JUMBO_HDRSPLIT then we see no errors.  With it we see number
of errors, however the rate seems to be reduced compaired to the
previous version of the driver.

Tom




-- 
TJU13-ARIN

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: What about net.isr ?

2010-09-13 Thread Shtorm
On Mon, 2010-09-13 at 10:18 -0300, Marcos Vinícius Buzo wrote:
> Hi all.
> 
> I have a dual Intel Xeon E5506 box running mpd5, dummynet and pf. Sometimes
> i get about 500+ pppoe connections to this machine, the network traffic goes
> to 30mbps and CPU usage hits 100%. I would like to know if netisr would help
> me using the other processor cores, and where I can get docs about it.
> My network card is a dual port Broadcom NetXtreme II BCM5709.
> 
> Thanks in advance
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

In case of pppoe server netisr does not gave me any benefits, it tries
to handle all traffic with one thread. Have no idea why, maybe it was my
mistake in setup. 

Check what your broadcom card can do, if it have msi-x with multiple
vectors it is better to setup number of vectors to number of cores
(maybe number of cores -1) and set sysctls net.isr.direct=1 and
net.isr.direct_force=1. In this case traffic processing will be divided
between cpu cores and your router will feel much better. I'm using intel
network cards on single xeon 5620 for pppoe + dummynet + nat, box
handles up to 70 kpps traffic 800+ connections. Also, megabit/s does not
matter for routers, packets/s is a thing that gives real cpu load.



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: net.inet.tcp.slowstart_flightsize in 8-STABLE

2010-09-13 Thread Maxim Dounin
Hello!

On Fri, Aug 06, 2010 at 10:56:40AM +0200, Andre Oppermann wrote:

> On 13.07.2010 16:01, Maxim Dounin wrote:
> >On Wed, May 12, 2010 at 04:47:02PM +0400, Igor Sysoev wrote:
> >
> >>It seems that net.inet.tcp.slowstart_flightsize does not work in 8-STABLE.
> >>For a long time I used slowstart_flightsize=2 on FreeBSD 4, 6, and 7 hosts.
> >>However, FreeBSD-8 always starts with the single packet.
> >>I saw this on different versions of 8-STABLE since 8 Oct 2009 till
> >>04 Apr 2010.
> >
> >Finally I had some time to look into it (sorry for long delay).
> >
> >1. Slow start isn't used on recent FreeBSD versions for initial snd_cwnd
> >calculations as long as you have rfc3390 support switched on (default since
> >Jan 06 23:29:46 2004, at least in 7.*).  It effectively sets initial
> >snd_cwnd to 3*MSS on common networks and shouldn't cause any problems.
> >Slowstart_flightsize only affects connection restarts.
> >
> >2. Due to bug in syncache code (patch below) all accepted connections has
> >their snd_cwnd reset to 1*MSS (since r171639, 7.0+ AFAIR).
> >
> >3. Support for rfc3465 introduced in r187289 uncovered (2) as
> >ACK to SYN/ACK no longer causes snd_cwnd increase by MSS (actually, this
> >increase shouldn't happen as it's explicitly forbidden by rfc 3390, but
> >it's another issue).  Snd_cwnd remains really small (1*MSS + 1) and this
> >causes really bad interaction with delayed acks on other side.
> >
> >As a workaround to delayed acks interaction problems you may disable
> >rfc3465 by setting net.inet.tcp.rfc3465 to 0.  Correct fix would be to apply
> >the patch below.
> >
> >To Andre Oppermann: could you please take a look at the patch and
> >commit it if found appropriate?
> 
> I've committed your fix with svn r210666. In a few days I will MFC it back
> to the stable branches.  Thanks for reporting the bug and a patch for it.

Andre, could you please take a look at one more patch as well?

Igor reported that it still sees 100ms delays with rfc3465 turned 
on, and it turns out to be similar issue (setting cwnd to 1*MSS) 
for hosts found in hostcache.

The problem with setting cwnd from hostcache was already reported 
here:

http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/92690
http://lists.freebsd.org/pipermail/freebsd-net/2007-July/014780.html

As using larger cwnd from hostcache may cause problems on 
congested links (see second thread) I changed code to only use 
cached cwnd as an upper bound for cwnd (instead of fixing current 
code).  This is also in-line with what we do on connection 
restarts.

We may later consider re-adding usage of larger cwnd from 
hostcache.  But I believe it should be done carefully and 
probably behind sysctl, off by default.

# HG changeset patch
# User Maxim Dounin 
# Date 1284352618 -14400
# Node ID bbb9fea7978b26b95e96d463238a3acd8bfb5575
# Parent  6aec795c568cf6b9d2fabf8b8b9e25ad75b053d0
Use cwnd from hostcache only as upper bound.

Setting initial congestion window from hostcache wasn't working for
accepted connection since introduction due to tp->snd_wnd being 0.  As
a result it was instead limiting cwnd on such connections to 1*MSS.  With
net.inet.tcp.rfc3465 enabled this results bad interaction with delayed
acks and 100ms delays for hosts found in hostcache.

Additionally, it's considered unsafe to use initial congestion window
larger than thouse specified in RFC3390 as this may cause problems on
congested links.  RFC5681 says equation from RFC3390 MUST be used as
upper bound.

Links:

http://lists.freebsd.org/pipermail/freebsd-net/2007-July/014780.html
http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/92690

diff --git a/netinet/tcp_input.c b/netinet/tcp_input.c
--- a/netinet/tcp_input.c
+++ b/netinet/tcp_input.c
@@ -3332,29 +3332,13 @@ tcp_mss(struct tcpcb *tp, int offer)
tp->snd_bandwidth = metrics.rmx_bandwidth;
 
/*
-* Set the slow-start flight size depending on whether this
-* is a local network or not.
+* Set initial congestion window per RFC3390.  Alternatively, set
+* flight size depending on whether this is a local network or not.
 *
-* Extend this so we cache the cwnd too and retrieve it here.
-* Make cwnd even bigger than RFC3390 suggests but only if we
-* have previous experience with the remote host. Be careful
-* not make cwnd bigger than remote receive window or our own
-* send socket buffer. Maybe put some additional upper bound
-* on the retrieved cwnd. Should do incremental updates to
-* hostcache when cwnd collapses so next connection doesn't
-* overloads the path again.
-*
-* RFC3390 says only do this if SYN or SYN/ACK didn't got lost.
-* We currently check only in syncache_socket for that.
+* RFC3390 says we MUST limit initial window to one segment if SYN
+* or SYN/ACK is lost.  We currently check only in syncache_socket()
+* for that.
 */
-#define TCP_M

Re: bge hangs on recent 7.3-STABLE

2010-09-13 Thread Pyun YongHyeon
On Mon, Sep 13, 2010 at 06:27:08PM +0400, Igor Sysoev wrote:
> On Thu, Sep 09, 2010 at 02:18:08PM -0700, Pyun YongHyeon wrote:
> 
> > On Thu, Sep 09, 2010 at 01:10:50PM -0700, Pyun YongHyeon wrote:
> > > On Thu, Sep 09, 2010 at 02:28:26PM +0400, Igor Sysoev wrote:
> > > > Hi,
> > > > 
> > > > I have several hosts running FreeBSD/amd64 7.2-STABLE updated on 
> > > > 11.01.2010
> > > > and 25.02.2010. Hosts process about 10K input and 10K output packets/s
> > > > without issues. One of them, however, is loaded more than others, so it
> > > > processes 20K/20K packets/s.
> > > > 
> > > > Recently, I have upgraded one host to 7.3-STABLE, 24.08.2010.
> > > > Then bge on this host hung two times. I was able to restart it from
> > > > console using:
> > > >   /etc/rc.d/netif restart bge0
> > > > 
> > > > Then I have upgraded the most loaded (20K/20K) host to 7.3-STABLE, 
> > > > 07.09.2010.
> > > > After reboot bge hung every several seconds. I was able to restart it,
> > > > but bge hung again after several seconds.
> > > > 
> > > > Then I have downgraded this host to 7.3-STABLE, 14.08.2010, since there
> > > > were several if_bge.c commits on 15.08.2010. The same hangs.
> > > > Then I have downgraded this host to 7.3-STABLE, 17.03.2010, before
> > > > the first if_bge.c commit after 25.02.2010. Now it runs without hangs.
> > > > 
> > > > The hosts are amd64 dual core SMP with 4G machines. bge information:
> > > > 
> > > > b...@pci0:4:0:0:class=0x02 card=0x165914e4 chip=0x165914e4 
> > > > rev=0x11 hdr=0x00
> > > > vendor = 'Broadcom Corporation'
> > > > device = 'NetXtreme Gigabit Ethernet PCI Express (BCM5721)'
> > > > 
> > > > bge0:  > > > 0x004101> mem 0xfe5f-0xfe5f irq 19 at device 0.0 on pci4
> > > > miibus1:  on bge0
> > > > brgphy0:  PHY 1 on miibus1
> > > > brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> > > > 1000baseT-FDX, auto
> > > > bge0: Ethernet address: 00:e0:81:5f:6e:8a
> > > > 
> > > 
> > > Could you show me verbose boot message(bge part only)?
> > > Also show me the output of "pciconf -lcbv".
> > > 
> > 
> > Forgot to send a patch. Let me know whether attached patch fixes
> > the issue or not.
> 
> > Index: sys/dev/bge/if_bge.c
> > ===
> > --- sys/dev/bge/if_bge.c(revision 212341)
> > +++ sys/dev/bge/if_bge.c(working copy)
> > @@ -3386,9 +3386,11 @@
> > sc->bge_rx_saved_considx = rx_cons;
> > bge_writembx(sc, BGE_MBX_RX_CONS0_LO, sc->bge_rx_saved_considx);
> > if (stdcnt)
> > -   bge_writembx(sc, BGE_MBX_RX_STD_PROD_LO, sc->bge_std);
> > +   bge_writembx(sc, BGE_MBX_RX_STD_PROD_LO, (sc->bge_std +
> > +   BGE_STD_RX_RING_CNT - 1) % BGE_STD_RX_RING_CNT);
> > if (jumbocnt)
> > -   bge_writembx(sc, BGE_MBX_RX_JUMBO_PROD_LO, sc->bge_jumbo);
> > +   bge_writembx(sc, BGE_MBX_RX_JUMBO_PROD_LO, (sc->bge_jumbo +
> > +   BGE_JUMBO_RX_RING_CNT - 1) % BGE_JUMBO_RX_RING_CNT);
> >  #ifdef notyet
> > /*
> >  * This register wraps very quickly under heavy packet drops.
> 
> Thank you, it seems the patch has fixed the bug.
> BTW, I noticed the same hungs on FreeBSD 8.1, date=2010.09.06.23.59.59
> I will apply the patch on all my updated hosts.
> 

Thanks for testing. I'm afraid bge(4) in HEAD, stable/8 and
stable/7(including 8.1-RELEASE and 7.3-RELEASE) may suffer from
this issue. Let me know what other hosts work with the patch.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: bge hangs on recent 7.3-STABLE

2010-09-13 Thread Igor Sysoev
On Mon, Sep 13, 2010 at 11:04:47AM -0700, Pyun YongHyeon wrote:

> On Mon, Sep 13, 2010 at 06:27:08PM +0400, Igor Sysoev wrote:
> > On Thu, Sep 09, 2010 at 02:18:08PM -0700, Pyun YongHyeon wrote:
> > 
> > > On Thu, Sep 09, 2010 at 01:10:50PM -0700, Pyun YongHyeon wrote:
> > > > On Thu, Sep 09, 2010 at 02:28:26PM +0400, Igor Sysoev wrote:
> > > > > Hi,
> > > > > 
> > > > > I have several hosts running FreeBSD/amd64 7.2-STABLE updated on 
> > > > > 11.01.2010
> > > > > and 25.02.2010. Hosts process about 10K input and 10K output packets/s
> > > > > without issues. One of them, however, is loaded more than others, so 
> > > > > it
> > > > > processes 20K/20K packets/s.
> > > > > 
> > > > > Recently, I have upgraded one host to 7.3-STABLE, 24.08.2010.
> > > > > Then bge on this host hung two times. I was able to restart it from
> > > > > console using:
> > > > >   /etc/rc.d/netif restart bge0
> > > > > 
> > > > > Then I have upgraded the most loaded (20K/20K) host to 7.3-STABLE, 
> > > > > 07.09.2010.
> > > > > After reboot bge hung every several seconds. I was able to restart it,
> > > > > but bge hung again after several seconds.
> > > > > 
> > > > > Then I have downgraded this host to 7.3-STABLE, 14.08.2010, since 
> > > > > there
> > > > > were several if_bge.c commits on 15.08.2010. The same hangs.
> > > > > Then I have downgraded this host to 7.3-STABLE, 17.03.2010, before
> > > > > the first if_bge.c commit after 25.02.2010. Now it runs without hangs.
> > > > > 
> > > > > The hosts are amd64 dual core SMP with 4G machines. bge information:

> > Thank you, it seems the patch has fixed the bug.
> > BTW, I noticed the same hungs on FreeBSD 8.1, date=2010.09.06.23.59.59
> > I will apply the patch on all my updated hosts.
> > 
> 
> Thanks for testing. I'm afraid bge(4) in HEAD, stable/8 and
> stable/7(including 8.1-RELEASE and 7.3-RELEASE) may suffer from
> this issue. Let me know what other hosts work with the patch.

Currently I have patched two hosts only: 7.3, 24.08.2010 and 8.1, 06.09.2010.
7.3 now handles 20K/20K packets/s without issues.

One host has been downgraded to 17.03.2010 as I already reported.
Other hosts still run 7.x, from January and February 2010.
If there not will be hangs I will upgrade other hosts and will patch them.


-- 
Igor Sysoev
http://sysoev.ru/en/
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: bce(4) - com_no_buffers (Again)

2010-09-13 Thread Pyun YongHyeon
On Mon, Sep 13, 2010 at 10:04:25AM -0500, Tom Judge wrote:
> On 09/09/2010 07:24 PM, Pyun YongHyeon wrote:
> > On Thu, Sep 09, 2010 at 03:58:30PM -0500, Tom Judge wrote:
> >   
> >> Hi,
> >> I am just following up on the thread from March (I think) about this issue.
> >>
> >> We are seeing this issue on a number of systems running 7.1. 
> >>
> >> The systems in question are all Dell:
> >>
> >> * R710 R610 R410
> >> * PE2950
> >>
> >> The latter do not show the issue as much as the R series systems.
> >>
> >> The cards in one of the R610's that I am testing with are:
> >>
> >> b...@pci0:1:0:0:class=0x02 card=0x02361028 chip=0x163914e4
> >> rev=0x20 hdr=0x00
> >> vendor = 'Broadcom Corporation'
> >> device = 'NetXtreme II BCM5709 Gigabit Ethernet'
> >> class  = network
> >> subclass   = ethernet
> >>
> >> They are connected to Dell PowerConnect 5424 switches.
> >>
> >> uname -a:
> >> FreeBSD bandor.chi-dc.mintel.ad 7.1-RELEASE-p4 FreeBSD 7.1-RELEASE-p4
> >> #3: Wed Sep  8 08:19:03 UTC 2010
> >> t...@dev-tj-7-1-amd64.chicago.mintel.ad:/usr/obj/usr/src/sys/MINTELv10  
> >> amd64
> >>
> >> We are also using 8192 byte jumbo frames, if_lagg and if_vlan in the
> >> configuration (the nics are in promisc as we are currently capturing
> >> netflow data on another vlan for diagnostic purposes. ):
> >>
> >>
> >> 
> 
> >> I have updated the bce driver and the Broadcomm MII driver to the
> >> version from stable/7 and am still seeing the issue.
> >>
> >> This morning I did a test with increasing the RX_PAGES to 8 but the
> >> system just hung starting the network.  The route command got stuck in a
> >> zone state (Sorry can't remember exactly which).
> >>
> >> The real question is, how do we go about increasing the number of RX
> >> BDs? I guess we have to bump more that just RX_PAGES...
> >>
> >>
> >> The cause for us, from what we can see, is the openldap server sending
> >> large group search results back to nss_ldap or pam_ldap.  When it does
> >> this it seems to send each of the 600 results in its own TCP segment
> >> creating a small packet storm (600*~100byte PDU's) at the destination
> >> host.  The kernel then retransmits 2 blocks of 100 results each after
> >> SACK kicks in for the data that was dropped by the NIC.
> >>
> >>
> >> Thanks in advance
> >>
> >> Tom
> >>
> >>
> >> 
> 
> > FW may drop incoming frames when it does not see available RX
> > buffers. Increasing number of RX buffers slightly reduce the
> > possibility of dropping frames but it wouldn't completely fix it.
> > Alternatively driver may tell available RX buffers in the middle
> > of RX ring processing instead of giving updated buffers at the end
> > of RX processing. This way FW may see available RX buffers while
> > driver/upper stack is busy to process received frames. But this may
> > introduce coherency issues because the RX ring is shared between
> > host and FW. If FreeBSD has way to sync partial region of a DMA
> > map, this could be implemented without fear of coherency issue.
> > Another way to improve RX performance would be switching to
> > multi-RX queue with RSS but that would require a lot of work and I
> > had no time to implement it.
> >   
> 
> Does this mean that these cards are going to perform badly? This is was
> what I gathered from the previous thread.
> 

I mean there are still many rooms to be done in driver for better
performance. bce(4) controllers are one of best controllers for
servers and driver didn't take full advantage of it.

> > BTW, given that you've updated to bce(4)/mii(4) of stable/7, I
> > wonder why TX/RX flow controls were not kicked in.
> >   
> 
> The working copy I used for grabbing the upstream source is at r212371.
> 
> Last changes for the directories in my working copy:
> 
> sys/dev/bce @  211388
> sys/dev/mii @ 212020
> 
> 
> I discovered that flow control was disabled on the switches, so I set it
> to auto and added a pair of BCE_PRINTF's in the code where it enables
> and disables flow control and now it gets enabled.
> 

Ok.

> 
> Without BCE_JUMBO_HDRSPLIT then we see no errors.  With it we see number
> of errors, however the rate seems to be reduced compaired to the
> previous version of the driver.
> 

It seems there are issues in header splitting and it was disabled
by default. Header splitting reduces packet processing overhead in
upper layer so it's normal to see better performance with header
splitting.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: bce(4) - com_no_buffers (Again)

2010-09-13 Thread Tom Judge
On 09/13/2010 01:48 PM, Pyun YongHyeon wrote:
> On Mon, Sep 13, 2010 at 10:04:25AM -0500, Tom Judge wrote:
>   
>>

>> Does this mean that these cards are going to perform badly? This is was
>> what I gathered from the previous thread.
>>
>> 
> I mean there are still many rooms to be done in driver for better
> performance. bce(4) controllers are one of best controllers for
> servers and driver didn't take full advantage of it.
>
>   

So far our experiences with bce(4) on FreeBSD have been very
disappointing.  Starting with when Dell switched to bce(4) based NIC's
(around the time 6.2 was released and with the introduction of the Power
Edge X9XX hardware) we have always had problems with the driver in every
release we have used: 6.2, 7.0 and 7.1.  Luckily David has been helpful
and helped us fix the issues.


>   
>> Without BCE_JUMBO_HDRSPLIT then we see no errors.  With it we see number
>> of errors, however the rate seems to be reduced compaired to the
>> previous version of the driver.
>>
>> 
> It seems there are issues in header splitting and it was disabled
> by default. Header splitting reduces packet processing overhead in
> upper layer so it's normal to see better performance with header
> splitting.
>   

The reason that we have had header splitting enabled in the past is that
historically there have been issues with memory fragmentation when using
8k jumbo frames (resulting in 9k mbuf's).

I have a kernel with the following configuration in testing right now:

* Flow control enabled.
* Jumbo header splitting turned off.


Is there any way that we can fix flow control with jumbo header
splitting turned on?

Thanks

Tom

PS. The following test was more than enough to trigger buffer shortages
with header splitting on:

( while true; do ldapsearch -h ldap-server1 -b "ou=Some,o=Base" dn; done
) &
( while true; do ldapsearch -h ldap-server1 -b "ou=Some,o=Base" dn; done
) &
( while true; do ldapsearch -h ldap-server1 -b "ou=Some,o=Base" dn; done
) &

The search in question returned about 1700 entries.

-- 
TJU13-ARIN

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: bce(4) - com_no_buffers (Again)

2010-09-13 Thread Andre Oppermann

On 13.09.2010 20:48, Pyun YongHyeon wrote:

On Mon, Sep 13, 2010 at 10:04:25AM -0500, Tom Judge wrote:

Without BCE_JUMBO_HDRSPLIT then we see no errors.  With it we see number
of errors, however the rate seems to be reduced compaired to the
previous version of the driver.



It seems there are issues in header splitting and it was disabled
by default. Header splitting reduces packet processing overhead in
upper layer so it's normal to see better performance with header
splitting.


I'm not sure that header splitting really helps much at least for TCP.
The only place where it could make a difference is at socket buffer
append time.  There the header get 'thrown away'.  With header splitting
the first mbuf in the chain containing the header can be returned to the
free pool.  Without header splitting it's just a offset change in the
mbuf.

IIRC header splitting was introduced with the Tigeon cards which were
the first programmable network cards and the first to support putting
the header in a different mbuf.  Header splitting, in theory, could
make a difference with zero copy sockets where the data portion in a
separate mbuf is flipped by VM magic into userspace.  The trouble is
that no driver fully supports the semantics required for page flipping
and the zero copy code, if compiled in, is less much less optimized for
the non-flipping case than the standard code path.  With the many dozen
gigabyte per second memory copy bandwidth of current CPU's it remains
questionable whether the page-flipping VM magic is actually faster than
a plain kernel/userspace copy as in the standard code path.  I generally
recommend not to use ZERO_COPY_SOCKETS.

I suspect in the case of the bce(4) driver the change in header splitting
is probably not the cause of the performance difference.

--
Andre
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: bce(4) - com_no_buffers (Again)

2010-09-13 Thread Tom Judge
On 09/13/2010 02:11 PM, Andre Oppermann wrote:
> On 13.09.2010 20:48, Pyun YongHyeon wrote:
>> On Mon, Sep 13, 2010 at 10:04:25AM -0500, Tom Judge wrote:
>>> Without BCE_JUMBO_HDRSPLIT then we see no errors.  With it we see
>>> number
>>> of errors, however the rate seems to be reduced compaired to the
>>> previous version of the driver.
Please note that 'rate' here relates to the rate at which
dev.bce.X.com_no_buffers is increasing not to PPS or bandwidth.

However the discussion is still interesting.
>>>
>>
>> It seems there are issues in header splitting and it was disabled
>> by default. Header splitting reduces packet processing overhead in
>> upper layer so it's normal to see better performance with header
>> splitting.
>
> I'm not sure that header splitting really helps much at least for TCP.
> The only place where it could make a difference is at socket buffer
> append time.  There the header get 'thrown away'.  With header splitting
> the first mbuf in the chain containing the header can be returned to the
> free pool.  Without header splitting it's just a offset change in the
> mbuf.
>
> IIRC header splitting was introduced with the Tigeon cards which were
> the first programmable network cards and the first to support putting
> the header in a different mbuf.  Header splitting, in theory, could
> make a difference with zero copy sockets where the data portion in a
> separate mbuf is flipped by VM magic into userspace.  The trouble is
> that no driver fully supports the semantics required for page flipping
> and the zero copy code, if compiled in, is less much less optimized for
> the non-flipping case than the standard code path.  With the many dozen
> gigabyte per second memory copy bandwidth of current CPU's it remains
> questionable whether the page-flipping VM magic is actually faster than
> a plain kernel/userspace copy as in the standard code path.  I generally
> recommend not to use ZERO_COPY_SOCKETS.
>
> I suspect in the case of the bce(4) driver the change in header splitting
> is probably not the cause of the performance difference.
>


-- 
TJU13-ARIN

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: bce(4) - com_no_buffers (Again)

2010-09-13 Thread Pyun YongHyeon
On Mon, Sep 13, 2010 at 02:07:58PM -0500, Tom Judge wrote:
> On 09/13/2010 01:48 PM, Pyun YongHyeon wrote:
> > On Mon, Sep 13, 2010 at 10:04:25AM -0500, Tom Judge wrote:
> >   
> >>
> 
> >> Does this mean that these cards are going to perform badly? This is was
> >> what I gathered from the previous thread.
> >>
> >> 
> > I mean there are still many rooms to be done in driver for better
> > performance. bce(4) controllers are one of best controllers for
> > servers and driver didn't take full advantage of it.
> >
> >   
> 
> So far our experiences with bce(4) on FreeBSD have been very
> disappointing.  Starting with when Dell switched to bce(4) based NIC's
> (around the time 6.2 was released and with the introduction of the Power
> Edge X9XX hardware) we have always had problems with the driver in every
> release we have used: 6.2, 7.0 and 7.1.  Luckily David has been helpful
> and helped us fix the issues.
> 
> 
> >   
> >> Without BCE_JUMBO_HDRSPLIT then we see no errors.  With it we see number
> >> of errors, however the rate seems to be reduced compaired to the
> >> previous version of the driver.
> >>
> >> 
> > It seems there are issues in header splitting and it was disabled
> > by default. Header splitting reduces packet processing overhead in
> > upper layer so it's normal to see better performance with header
> > splitting.
> >   
> 
> The reason that we have had header splitting enabled in the past is that
> historically there have been issues with memory fragmentation when using
> 8k jumbo frames (resulting in 9k mbuf's).
> 

Yes, if you use jumbo frames, header splitting would help to reduce
memory fragmentation as header splitting wouldn't allocate jumbo
clusters.

> I have a kernel with the following configuration in testing right now:
> 
> * Flow control enabled.
> * Jumbo header splitting turned off.
> 
> 
> Is there any way that we can fix flow control with jumbo header
> splitting turned on?
> 

Flow control has nothing to do with header splitting(i.e. flow
control is always enabled). 

> Thanks
> 
> Tom
> 
> PS. The following test was more than enough to trigger buffer shortages
> with header splitting on:
> 
> ( while true; do ldapsearch -h ldap-server1 -b "ou=Some,o=Base" dn; done
> ) &
> ( while true; do ldapsearch -h ldap-server1 -b "ou=Some,o=Base" dn; done
> ) &
> ( while true; do ldapsearch -h ldap-server1 -b "ou=Some,o=Base" dn; done
> ) &
> 
> The search in question returned about 1700 entries.
> 

I can trigger this kind of buffer shortage with benchmark tools.
Actually fixing header splitting is on my TODO list as well as
other things but I don't know how long it would take.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: bce(4) - com_no_buffers (Again)

2010-09-13 Thread Pyun YongHyeon
On Mon, Sep 13, 2010 at 09:11:25PM +0200, Andre Oppermann wrote:
> On 13.09.2010 20:48, Pyun YongHyeon wrote:
> >On Mon, Sep 13, 2010 at 10:04:25AM -0500, Tom Judge wrote:
> >>Without BCE_JUMBO_HDRSPLIT then we see no errors.  With it we see number
> >>of errors, however the rate seems to be reduced compaired to the
> >>previous version of the driver.
> >>
> >
> >It seems there are issues in header splitting and it was disabled
> >by default. Header splitting reduces packet processing overhead in
> >upper layer so it's normal to see better performance with header
> >splitting.
> 
> I'm not sure that header splitting really helps much at least for TCP.
> The only place where it could make a difference is at socket buffer
> append time.  There the header get 'thrown away'.  With header splitting
> the first mbuf in the chain containing the header can be returned to the
> free pool.  Without header splitting it's just a offset change in the
> mbuf.
> 
> IIRC header splitting was introduced with the Tigeon cards which were
> the first programmable network cards and the first to support putting
> the header in a different mbuf.  Header splitting, in theory, could
> make a difference with zero copy sockets where the data portion in a
> separate mbuf is flipped by VM magic into userspace.  The trouble is
> that no driver fully supports the semantics required for page flipping
> and the zero copy code, if compiled in, is less much less optimized for
> the non-flipping case than the standard code path.  With the many dozen
> gigabyte per second memory copy bandwidth of current CPU's it remains
> questionable whether the page-flipping VM magic is actually faster than
> a plain kernel/userspace copy as in the standard code path.  I generally
> recommend not to use ZERO_COPY_SOCKETS.
> 
> I suspect in the case of the bce(4) driver the change in header splitting
> is probably not the cause of the performance difference.
> 

I'm under the impression the header splitting in bce(4) is for
LRO(opposite of TSO), not for VM magic to enable page flipping
tricks.

> -- 
> Andre
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Ramadan heureux mon cher

2010-09-13 Thread Mme Claire Page

I am Mrs Claire Page sending you this mail from my sick bed in the hospital.
Please contact my lawyer, Email:(barr_willam_fr...@lawyer.com)

Je suis Mme Claire Page vous envoie ce mail de mon lit de malade à l'hôpital. 
S'il vous plaît communiquer avec mon avocat, Email: 
(barr_willam_fr...@lawyer.com)



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: bce(4) - com_no_buffers (Again)

2010-09-13 Thread Tom Judge
On 09/13/2010 02:33 PM, Pyun YongHyeon wrote:
> On Mon, Sep 13, 2010 at 02:07:58PM -0500, Tom Judge wrote:
>   
>> On 09/13/2010 01:48 PM, Pyun YongHyeon wrote:
>> 
>>> On Mon, Sep 13, 2010 at 10:04:25AM -0500, Tom Judge wrote:
>>>   
>>>   
 
>> 
>> 
 Does this mean that these cards are going to perform badly? This is was
 what I gathered from the previous thread.

 
 
>>> I mean there are still many rooms to be done in driver for better
>>> performance. bce(4) controllers are one of best controllers for
>>> servers and driver didn't take full advantage of it.
>>>
>>>   
>>>   
>> So far our experiences with bce(4) on FreeBSD have been very
>> disappointing.  Starting with when Dell switched to bce(4) based NIC's
>> (around the time 6.2 was released and with the introduction of the Power
>> Edge X9XX hardware) we have always had problems with the driver in every
>> release we have used: 6.2, 7.0 and 7.1.  Luckily David has been helpful
>> and helped us fix the issues.
>>
>> 
>> 
>>>   
>>>   
 Without BCE_JUMBO_HDRSPLIT then we see no errors.  With it we see number
 of errors, however the rate seems to be reduced compaired to the
 previous version of the driver.

 
 
>>> It seems there are issues in header splitting and it was disabled
>>> by default. Header splitting reduces packet processing overhead in
>>> upper layer so it's normal to see better performance with header
>>> splitting.
>>>   
>>>   
>> The reason that we have had header splitting enabled in the past is that
>> historically there have been issues with memory fragmentation when using
>> 8k jumbo frames (resulting in 9k mbuf's).
>>
>> 
> Yes, if you use jumbo frames, header splitting would help to reduce
> memory fragmentation as header splitting wouldn't allocate jumbo
> clusters.
>
>   

Under testing I have yet to see a memory fragmentation issue with this
driver.  I follow up if/when I find a problem with this again.

>> I have a kernel with the following configuration in testing right now:
>>
>> * Flow control enabled.
>> * Jumbo header splitting turned off.
>>
>>
>> Is there any way that we can fix flow control with jumbo header
>> splitting turned on?
>>
>> 
> Flow control has nothing to do with header splitting(i.e. flow
> control is always enabled). 
>
>   
Sorry let me rephrase that:

Is there a way to fix the RX buffer shortage issues (when header
splitting is turned on) so that they are guarded by flow control.  Maybe
change the low watermark for flow control when its enabled?


>> Thanks
>>
>> Tom
>>
>> PS. The following test was more than enough to trigger buffer shortages
>> with header splitting on:
>>
>> ( while true; do ldapsearch -h ldap-server1 -b "ou=Some,o=Base" dn; done
>> ) &
>> ( while true; do ldapsearch -h ldap-server1 -b "ou=Some,o=Base" dn; done
>> ) &
>> ( while true; do ldapsearch -h ldap-server1 -b "ou=Some,o=Base" dn; done
>> ) &
>>
>> The search in question returned about 1700 entries.
>>
>> 
> I can trigger this kind of buffer shortage with benchmark tools.
> Actually fixing header splitting is on my TODO list as well as
> other things but I don't know how long it would take.
>   

Great to here, thanks for all the hard work.

Tom

-- 
TJU13-ARIN

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: bce(4) - com_no_buffers (Again)

2010-09-13 Thread Pyun YongHyeon
On Mon, Sep 13, 2010 at 03:38:41PM -0500, Tom Judge wrote:
> On 09/13/2010 02:33 PM, Pyun YongHyeon wrote:
> > On Mon, Sep 13, 2010 at 02:07:58PM -0500, Tom Judge wrote:
> >   
> >> On 09/13/2010 01:48 PM, Pyun YongHyeon wrote:
> >> 
> >>> On Mon, Sep 13, 2010 at 10:04:25AM -0500, Tom Judge wrote:
> >>>   
> >>>   
>  
> >> 
> >> 
>  Does this mean that these cards are going to perform badly? This is was
>  what I gathered from the previous thread.
> 
>  
>  
> >>> I mean there are still many rooms to be done in driver for better
> >>> performance. bce(4) controllers are one of best controllers for
> >>> servers and driver didn't take full advantage of it.
> >>>
> >>>   
> >>>   
> >> So far our experiences with bce(4) on FreeBSD have been very
> >> disappointing.  Starting with when Dell switched to bce(4) based NIC's
> >> (around the time 6.2 was released and with the introduction of the Power
> >> Edge X9XX hardware) we have always had problems with the driver in every
> >> release we have used: 6.2, 7.0 and 7.1.  Luckily David has been helpful
> >> and helped us fix the issues.
> >>
> >> 
> >> 
> >>>   
> >>>   
>  Without BCE_JUMBO_HDRSPLIT then we see no errors.  With it we see number
>  of errors, however the rate seems to be reduced compaired to the
>  previous version of the driver.
> 
>  
>  
> >>> It seems there are issues in header splitting and it was disabled
> >>> by default. Header splitting reduces packet processing overhead in
> >>> upper layer so it's normal to see better performance with header
> >>> splitting.
> >>>   
> >>>   
> >> The reason that we have had header splitting enabled in the past is that
> >> historically there have been issues with memory fragmentation when using
> >> 8k jumbo frames (resulting in 9k mbuf's).
> >>
> >> 
> > Yes, if you use jumbo frames, header splitting would help to reduce
> > memory fragmentation as header splitting wouldn't allocate jumbo
> > clusters.
> >
> >   
> 
> Under testing I have yet to see a memory fragmentation issue with this
> driver.  I follow up if/when I find a problem with this again.
> 
> >> I have a kernel with the following configuration in testing right now:
> >>
> >> * Flow control enabled.
> >> * Jumbo header splitting turned off.
> >>
> >>
> >> Is there any way that we can fix flow control with jumbo header
> >> splitting turned on?
> >>
> >> 
> > Flow control has nothing to do with header splitting(i.e. flow
> > control is always enabled). 
> >
> >   
> Sorry let me rephrase that:
> 
> Is there a way to fix the RX buffer shortage issues (when header
> splitting is turned on) so that they are guarded by flow control.  Maybe
> change the low watermark for flow control when its enabled?
> 

I'm not sure how much it would help but try changing RX low
watermark. Default value is 32 which seems to be reasonable value.
But it's only for 5709/5716 controllers and Linux seems to use
different default value.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: bge hangs on recent 7.3-STABLE

2010-09-13 Thread Vlad Galu
On Mon, Sep 13, 2010 at 9:04 PM, Pyun YongHyeon  wrote:
> On Mon, Sep 13, 2010 at 06:27:08PM +0400, Igor Sysoev wrote:
>> On Thu, Sep 09, 2010 at 02:18:08PM -0700, Pyun YongHyeon wrote:
>>
>> > On Thu, Sep 09, 2010 at 01:10:50PM -0700, Pyun YongHyeon wrote:
>> > > On Thu, Sep 09, 2010 at 02:28:26PM +0400, Igor Sysoev wrote:
>> > > > Hi,
>> > > >
>> > > > I have several hosts running FreeBSD/amd64 7.2-STABLE updated on 
>> > > > 11.01.2010
>> > > > and 25.02.2010. Hosts process about 10K input and 10K output packets/s
>> > > > without issues. One of them, however, is loaded more than others, so it
>> > > > processes 20K/20K packets/s.
>> > > >
>> > > > Recently, I have upgraded one host to 7.3-STABLE, 24.08.2010.
>> > > > Then bge on this host hung two times. I was able to restart it from
>> > > > console using:
>> > > >   /etc/rc.d/netif restart bge0
>> > > >
>> > > > Then I have upgraded the most loaded (20K/20K) host to 7.3-STABLE, 
>> > > > 07.09.2010.
>> > > > After reboot bge hung every several seconds. I was able to restart it,
>> > > > but bge hung again after several seconds.
>> > > >
>> > > > Then I have downgraded this host to 7.3-STABLE, 14.08.2010, since there
>> > > > were several if_bge.c commits on 15.08.2010. The same hangs.
>> > > > Then I have downgraded this host to 7.3-STABLE, 17.03.2010, before
>> > > > the first if_bge.c commit after 25.02.2010. Now it runs without hangs.
>> > > >
>> > > > The hosts are amd64 dual core SMP with 4G machines. bge information:
>> > > >
>> > > > b...@pci0:4:0:0:        class=0x02 card=0x165914e4 chip=0x165914e4 
>> > > > rev=0x11 hdr=0x00
>> > > >     vendor     = 'Broadcom Corporation'
>> > > >     device     = 'NetXtreme Gigabit Ethernet PCI Express (BCM5721)'
>> > > >
>> > > > bge0: > > > > 0x004101> mem 0xfe5f-0xfe5f irq 19 at device 0.0 on pci4
>> > > > miibus1:  on bge0
>> > > > brgphy0:  PHY 1 on miibus1
>> > > > brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
>> > > > 1000baseT-FDX, auto
>> > > > bge0: Ethernet address: 00:e0:81:5f:6e:8a
>> > > >
>> > >
>> > > Could you show me verbose boot message(bge part only)?
>> > > Also show me the output of "pciconf -lcbv".
>> > >
>> >
>> > Forgot to send a patch. Let me know whether attached patch fixes
>> > the issue or not.
>>
>> > Index: sys/dev/bge/if_bge.c
>> > ===
>> > --- sys/dev/bge/if_bge.c    (revision 212341)
>> > +++ sys/dev/bge/if_bge.c    (working copy)
>> > @@ -3386,9 +3386,11 @@
>> >     sc->bge_rx_saved_considx = rx_cons;
>> >     bge_writembx(sc, BGE_MBX_RX_CONS0_LO, sc->bge_rx_saved_considx);
>> >     if (stdcnt)
>> > -           bge_writembx(sc, BGE_MBX_RX_STD_PROD_LO, sc->bge_std);
>> > +           bge_writembx(sc, BGE_MBX_RX_STD_PROD_LO, (sc->bge_std +
>> > +               BGE_STD_RX_RING_CNT - 1) % BGE_STD_RX_RING_CNT);
>> >     if (jumbocnt)
>> > -           bge_writembx(sc, BGE_MBX_RX_JUMBO_PROD_LO, sc->bge_jumbo);
>> > +           bge_writembx(sc, BGE_MBX_RX_JUMBO_PROD_LO, (sc->bge_jumbo +
>> > +               BGE_JUMBO_RX_RING_CNT - 1) % BGE_JUMBO_RX_RING_CNT);
>> >  #ifdef notyet
>> >     /*
>> >      * This register wraps very quickly under heavy packet drops.
>>
>> Thank you, it seems the patch has fixed the bug.
>> BTW, I noticed the same hungs on FreeBSD 8.1, date=2010.09.06.23.59.59
>> I will apply the patch on all my updated hosts.
>>
>
> Thanks for testing. I'm afraid bge(4) in HEAD, stable/8 and
> stable/7(including 8.1-RELEASE and 7.3-RELEASE) may suffer from
> this issue. Let me know what other hosts work with the patch.

Hi Pyun,

Thanks for the patch. It seems to have fixed the symptom in my case,
on a card identical to Igor's, but on board of an IBM eServer 306m.

Regards,
Vlad

-- 
Good, fast & cheap. Pick any two.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: bce(4) - com_no_buffers (Again)

2010-09-13 Thread Pyun YongHyeon
On Mon, Sep 13, 2010 at 03:21:13PM -0700, David Christensen wrote:
> > I'm under the impression the header splitting in bce(4) is for
> > LRO(opposite of TSO), not for VM magic to enable page flipping
> > tricks.
> 
> Header splitting was implemented in the Linux version of bce(4)
> to prevent jumbo memory allocations.  Allocating 9KB frames was
> causing problems on systems used for virtualization.  (Harder to
> find a contiguous 9KB frame when a hypervisor is in use.)  Using 
> 4KB or smaller buffer sizes was considered more compatible with
> virtualization.  
> 
> LRO (Large Receive Offload, aka Transparent Packet Aggregation
> or TPA on the 10Gb controllers) is not supported on the 1Gb 
> bce(4) devices.
> 

I meant tcp_lro implementation of FreeBSD. ATM tcp_lro_rx() runs
long list of sanity checks before combining TCP segments into a TCP
segment but if TCP header is split with its payload I guess we can
optimize that path. This way we may be able to support LRO over
VLAN, I guess.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


RE: bce(4) - com_no_buffers (Again)

2010-09-13 Thread David Christensen
> I'm under the impression the header splitting in bce(4) is for
> LRO(opposite of TSO), not for VM magic to enable page flipping
> tricks.

Header splitting was implemented in the Linux version of bce(4)
to prevent jumbo memory allocations.  Allocating 9KB frames was
causing problems on systems used for virtualization.  (Harder to
find a contiguous 9KB frame when a hypervisor is in use.)  Using 
4KB or smaller buffer sizes was considered more compatible with
virtualization.  

LRO (Large Receive Offload, aka Transparent Packet Aggregation
or TPA on the 10Gb controllers) is not supported on the 1Gb 
bce(4) devices.

Dave

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: bge hangs on recent 7.3-STABLE

2010-09-13 Thread Pyun YongHyeon
On Tue, Sep 14, 2010 at 01:08:08AM +0300, Vlad Galu wrote:
> On Mon, Sep 13, 2010 at 9:04 PM, Pyun YongHyeon  wrote:
> > On Mon, Sep 13, 2010 at 06:27:08PM +0400, Igor Sysoev wrote:
> >> On Thu, Sep 09, 2010 at 02:18:08PM -0700, Pyun YongHyeon wrote:
> >>
> >> > On Thu, Sep 09, 2010 at 01:10:50PM -0700, Pyun YongHyeon wrote:
> >> > > On Thu, Sep 09, 2010 at 02:28:26PM +0400, Igor Sysoev wrote:
> >> > > > Hi,
> >> > > >
> >> > > > I have several hosts running FreeBSD/amd64 7.2-STABLE updated on 
> >> > > > 11.01.2010
> >> > > > and 25.02.2010. Hosts process about 10K input and 10K output 
> >> > > > packets/s
> >> > > > without issues. One of them, however, is loaded more than others, so 
> >> > > > it
> >> > > > processes 20K/20K packets/s.
> >> > > >
> >> > > > Recently, I have upgraded one host to 7.3-STABLE, 24.08.2010.
> >> > > > Then bge on this host hung two times. I was able to restart it from
> >> > > > console using:
> >> > > > ? /etc/rc.d/netif restart bge0
> >> > > >
> >> > > > Then I have upgraded the most loaded (20K/20K) host to 7.3-STABLE, 
> >> > > > 07.09.2010.
> >> > > > After reboot bge hung every several seconds. I was able to restart 
> >> > > > it,
> >> > > > but bge hung again after several seconds.
> >> > > >
> >> > > > Then I have downgraded this host to 7.3-STABLE, 14.08.2010, since 
> >> > > > there
> >> > > > were several if_bge.c commits on 15.08.2010. The same hangs.
> >> > > > Then I have downgraded this host to 7.3-STABLE, 17.03.2010, before
> >> > > > the first if_bge.c commit after 25.02.2010. Now it runs without 
> >> > > > hangs.
> >> > > >
> >> > > > The hosts are amd64 dual core SMP with 4G machines. bge information:
> >> > > >
> >> > > > b...@pci0:4:0:0: ? ? ? ?class=0x02 card=0x165914e4 
> >> > > > chip=0x165914e4 rev=0x11 hdr=0x00
> >> > > > ? ? vendor ? ? = 'Broadcom Corporation'
> >> > > > ? ? device ? ? = 'NetXtreme Gigabit Ethernet PCI Express (BCM5721)'
> >> > > >
> >> > > > bge0:  >> > > > 0x004101> mem 0xfe5f-0xfe5f irq 19 at device 0.0 on pci4
> >> > > > miibus1:  on bge0
> >> > > > brgphy0:  PHY 1 on miibus1
> >> > > > brgphy0: ?10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> >> > > > 1000baseT-FDX, auto
> >> > > > bge0: Ethernet address: 00:e0:81:5f:6e:8a
> >> > > >
> >> > >
> >> > > Could you show me verbose boot message(bge part only)?
> >> > > Also show me the output of "pciconf -lcbv".
> >> > >
> >> >
> >> > Forgot to send a patch. Let me know whether attached patch fixes
> >> > the issue or not.
> >>
> >> > Index: sys/dev/bge/if_bge.c
> >> > ===
> >> > --- sys/dev/bge/if_bge.c ? ?(revision 212341)
> >> > +++ sys/dev/bge/if_bge.c ? ?(working copy)
> >> > @@ -3386,9 +3386,11 @@
> >> > ? ? sc->bge_rx_saved_considx = rx_cons;
> >> > ? ? bge_writembx(sc, BGE_MBX_RX_CONS0_LO, sc->bge_rx_saved_considx);
> >> > ? ? if (stdcnt)
> >> > - ? ? ? ? ? bge_writembx(sc, BGE_MBX_RX_STD_PROD_LO, sc->bge_std);
> >> > + ? ? ? ? ? bge_writembx(sc, BGE_MBX_RX_STD_PROD_LO, (sc->bge_std +
> >> > + ? ? ? ? ? ? ? BGE_STD_RX_RING_CNT - 1) % BGE_STD_RX_RING_CNT);
> >> > ? ? if (jumbocnt)
> >> > - ? ? ? ? ? bge_writembx(sc, BGE_MBX_RX_JUMBO_PROD_LO, sc->bge_jumbo);
> >> > + ? ? ? ? ? bge_writembx(sc, BGE_MBX_RX_JUMBO_PROD_LO, (sc->bge_jumbo +
> >> > + ? ? ? ? ? ? ? BGE_JUMBO_RX_RING_CNT - 1) % BGE_JUMBO_RX_RING_CNT);
> >> > ?#ifdef notyet
> >> > ? ? /*
> >> > ? ? ?* This register wraps very quickly under heavy packet drops.
> >>
> >> Thank you, it seems the patch has fixed the bug.
> >> BTW, I noticed the same hungs on FreeBSD 8.1, date=2010.09.06.23.59.59
> >> I will apply the patch on all my updated hosts.
> >>
> >
> > Thanks for testing. I'm afraid bge(4) in HEAD, stable/8 and
> > stable/7(including 8.1-RELEASE and 7.3-RELEASE) may suffer from
> > this issue. Let me know what other hosts work with the patch.
> 
> Hi Pyun,
> 
> Thanks for the patch. It seems to have fixed the symptom in my case,
> on a card identical to Igor's, but on board of an IBM eServer 306m.
> 

Thanks for reporting and testing! I really appreciate it.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: FreeBSD route tables limited 16?

2010-09-13 Thread Julian Elischer

On 9/13/10 5:18 PM, Dave Seddon wrote:

Greetings Julian,

I've been wondering if it's possible to increase the number of FreeBSD
route tables to a larger number.  It seems this is currently 4 bits,
however I was wondering about perhaps 16 bits?



Yes the code is designed to handle many more and if you do
create more then everything SHOULD handle it.
The bottleneck is that we need to store an associated fib with
each outgoing (or for that matter incoming) packet, bit we do not at
this time want to dedicate a whole word in the mbuf to the task.
My "hack" for 8.x (before it was done) was to hide the information
in the flags word of the mbuf.
I only took 4 bits to make sure I didn't trample on other
people's use of bits there. The plan is/was to make a separate
entry in the mbuf some time after 7.x branched (say, "now" for
example :-)  )
you could just steal more bits for now, but if you take 8 bits
there will only be one spare.

(see /sys/sys/mbuf.h)

It may just be time to bite the bullet and steal the entry.

Out of curiosity, why do you need > 16 fibs?

have you considered using vnet jails a well?





/* MRT compile-time constants */
#ifdef _KERNEL
  #ifndef ROUTETABLES
   #define RT_NUMFIBS 1
   #define RT_MAXFIBS 1
  #else
   /* while we use 4 bits in the mbuf flags, we are limited to 16 */
   #define RT_MAXFIBS 16
   #if ROUTETABLES>  RT_MAXFIBS
#define RT_NUMFIBS RT_MAXFIBS
#error "ROUTETABLES defined too big"
   #else
#if ROUTETABLES == 0
 #define RT_NUMFIBS 1
#else
 #define RT_NUMFIBS ROUTETABLES
#endif
   #endif
  #endif
#endif

Really liked your announcement years ago:
http://lists.freebsd.org/pipermail/freebsd-arch/2007-December/007331.html

Kind regards,
Dave Seddon
+61 447 SEDDON
d...@seddon.ca

-Original Message-
From: Andrew Hannam
To: d...@seddon.ca
Subject: RE: FreeBSD route tables - limited to 16 :(
Date: Mon, 13 Sep 2010 15:55:47 +1000
Mailer: Microsoft Office Outlook 12.0

I think the gentleman is confusing route-tables with routes.
150K routes is easily possible but it is obvious there is currently only 
support for up to 16 route tables.

I think that you are right and the number of bits will need to be updated.

I don't know the answer to the 'route leaking' question and it has been a long 
time since I looked at this code.

You really need to speaking the specialist responsible for the multiple route 
table code. This person should be clearly marked in the code headers.

I'm guessing that no-one has thought about using it the way you are planning to 
use it.

If I get some time I will have a look - but don't hold your breath.

Regards,
Andrew.

-Original Message-
From: Dave Seddon [mailto:d...@seddon.ca]
Sent: Saturday, 11 September 2010 12:52 AM
To: Aldous, Matthew D
Cc: d...@seddon.ca; Andrew Hannam; Truman Boyes
Subject: RE: FreeBSD route tables - limited to 16 :(

Greetings,

I'm guessing we need to adjust the number of bits defined for the route
table in the mbufs structure definition (where ever that is), then we
can update the route.h to match.

I guess really we should make the mbufs codes _and_ route.h code pickup
the KERNCONF definition of the variable ROUTETABLES.

Andrew - thoughts on this?

I'm not sure if the firewall rules allow you to update the route table
variable in the mbuf, but if it doesn't we should allow this.  This
would be equivelant to what they call 'route leaking' in MPLS speak,
when you can pop traffic from one VPN to another (very nasty, but
sometimes handy).


yes ipfw does allow you to do this but it needs some more work..
It only really works as the naive user may expect on incoming packets.



Regards,
Dave

On Fri, 2010-09-10 at 19:05 +1000, Aldous, Matthew D wrote:




From: Dave Seddon [d...@seddon.ca]
Sent: Friday, 10 September 2010 6:36 PM
To: Andrew Hannam
Cc: d...@seddon.ca; Aldous, Matthew D; Truman Boyes
Subject: FreeBSD route tables - limited to 16 :(

I just tried compiling up FreeBSD 8.1 with 1024 route tables.  It's
throwing an error, which is tracked down to the
vi /usr/src/sys/net/route.h (line 99ish).  The limit is 16, because as
the comments say this is 4 bits.  Need to look into increasing this to
say 16 bits :).  Given each mbuf will have this, it could cause a
significant increase in memory usage for a system with a large number of
packets (although who cares, ram is cheap).


/* MRT compile-time constants */
#ifdef _KERNEL
  #ifndef ROUTETABLES
   #define RT_NUMFIBS 1
   #define RT_MAXFIBS 1
  #else
   /* while we use 4 bits in the mbuf flags, we are limited to 16 */
   #define RT_MAXFIBS 16
   #if ROUTETABLES>  RT_MAXFIBS
#define RT_NUMFIBS RT_MAXFIBS
#error "ROUTETABLES defined too big"
   #else
#if ROUTETABLES == 0
 #define RT_NUMFIBS 1
#else
 #define RT_NUMFIBS ROUTETABLES
#endif
   #endif
  #endif
#endif








___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.or