Re: Removal of deprecated implied connect for TCP
Based on the feedback I withdraw the proposal to remove implied connect from TCP. Instead I will look at it closer and fix any loose ends that may have come from other changes in the TCP code. Many good points have been raised and I will repeat them here again for the archives: o In FreeBSD most, if not all, protocols support implied connects, removing it from TCP would make it an outlier. o It is being used by at least one product based on FreeBSD. o It can speed up the data sending phase by sending data on the ACK after SYN-ACK. [In RFC1644 it already sent data on the initial SYN but no one is accepting that anymore.] It is important to note, though, that implied connect in TCP is non-standard and no other even remotely popular OS supports it. Thus any applications making use of it are non-portable. -- Andre On 11.09.2010 17:38, Randall Stewart wrote: All: One thing to note.. when you can do an implied connection setup, the 3-way hand shake has the potential to carry data (don't know if tcp does in FreeBSD) on the third leg of the 3-way handshake. This is one of the reasons SCTP uses this.. since we often will carry data an the third and even possibly the 4th leg of the handshake (we have one extra leg). Taking this feature out of TCP will make it so we will be like all other o/s's and the socket semantic will prevent you from doing data on the third leg.. SYN>instead of SYN--> <---SYN-ACk-- ---ACK+DATA--> In the past I have mentioned in classes I teach that TCP is capable of this but the O/S's of the world do not allow this later behavior.. Just thoughts and ramblings ;-) R On Sep 10, 2010, at 2:51 PM, Karim Fodil-Lemelin wrote: On 31/08/2010 5:32 PM, Robert Watson wrote: On Tue, 31 Aug 2010, Andre Oppermann wrote: I'm not entirely comfortable with this change, and would like a chance to cogitate on it a bit more. While I'm not aware of any applications depending on the semantic for TCP, I know that we do use it for UNIX domain sockets. I don't have any plans to remove the implied connect support from the socket layer or other protocols, only from TCP. Right -- the implicit question is: why should TCP be the only stream protocol in our stack *not* to support implied connection, when we plan to continue to support it for all other protocols? For deprecating this part of the TCP API there is no documentation to the implied connect in tcp(4). In sendto(2) it doesn't differentiate between protocols and simply says: "... sendto() and sendmsg() may be used at any time." For MSG_EOF it says that is only supported for SOCK_STREAM sockets in the PF_INET protocol family. These sentences have to be corrected. In general, deprecating is taken to mean providing significant and explicit advance warning of removal -- for example, updating the 8.x man page to point out that the feature is deprecated and it will not appear in future releases of FreeBSD. Robert ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" Hi, For what its worth, we at Xiphos (now XipLink), are still using sendto and T/TCP and is one of the reasons we've chosen FreeBSD more then 10 years ago! Best regards, Karim. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" -- Randall Stewart 803-317-4952 (cell) ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Removal of deprecated implied connect for TCP
On 11.09.2010 17:38, Randall Stewart wrote: All: One thing to note.. when you can do an implied connection setup, the 3-way hand shake has the potential to carry data (don't know if tcp does in FreeBSD) on the third leg of the 3-way handshake. This is one of the reasons SCTP uses this.. since we often will carry data an the third and even possibly the 4th leg of the handshake (we have one extra leg). Taking this feature out of TCP will make it so we will be like all other o/s's and the socket semantic will prevent you from doing data on the third leg.. SYN>instead of SYN--> <---SYN-ACk-- ---ACK+DATA--> In the past I have mentioned in classes I teach that TCP is capable of this but the O/S's of the world do not allow this later behavior.. The savings in TCP for the case you describe here are not that great. With piggy-backing data on the third leg you save one (small) ACK packet and one round-trip to userspace for the application to start sending data. There is no need to wait for a full network round trip time to start sending data. The real savings from implied connected with RFC1644 came from sending data together with initial SYN. The receiving side would either queue the data until the 3WHS was complete, or upon later invocations directly create a socket upon SYN. The trouble with the way of doing it in RFC1644 was the very weak protection against very simple DoS attacks. Only a connection count variable was used to prevent fake SYN's from opening new sockets. This plus the required socket layer changes (the implied connect) caused a quick halt if any further RFC1644 adoption. -- Andre ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Current problem reports assigned to freebsd-net@FreeBSD.org
Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description o kern/150257 net[msk] watchdog timeout o kern/150251 net[patch] [ixgbe] Late cable insertion broken o kern/150249 net[ixgbe] Media type detection broken o kern/150247 net[patch] [ixgbe] Version in -current won't build on 7.x o bin/150224 netppp does not reassign static IP after kill -KILL comma o kern/150148 net[ath] Atheros 5424/2424 - AR2425 stopped working with o kern/150052 netwi(4) driver does not work with wlan(4) driver for Luc f kern/149969 net[wlan] [ral] ralink rt2661 fails to maintain connectio o kern/149937 net[ipfilter] [patch] kernel panic in ipfilter IP fragmen o kern/149804 net[icmp] [panic] ICMP redirect on causes "panic: rtqkill o kern/149786 net[bwn] bwn on Dell Inspiron 1150: connections stall o kern/149643 net[rum] device not sending proper beacon frames in ap mo o kern/149609 net[panic] reboot after adding second default route o kern/149539 net[ath] atheros ar9287 is not supported by ath_hal o kern/149516 net[ath] ath(4) hostap with fake MAC/BSSID results in sta o kern/149373 net[realtek/atheros]: None of my network card working o kern/149307 net[ath] Doesn't work Atheros 9285 o kern/149306 net[alc] Doesn't work Atheros AR8131 PCIe Gigabit Etherne o kern/149117 net[inet] [patch] in_pcbbind: redundant test o kern/149086 net[multicast] Generic multicast join failure in 8.1 o kern/148862 net[panic] page fault while in kernel mode at _mtx_lock_s o kern/148322 net[ath] Triggering atheros wifi beacon misses in hostap o kern/148317 net[ath] FreeBSD 7.x hostap memory leak in net80211 or At o kern/148078 net[ath] wireless networking stops functioning o kern/147985 net[alc] alc network driver + tso ( + vlan ? ) does not w o kern/147894 net[ipsec] IPv6-in-IPv4 does not work inside an ESP-only o kern/147862 net[wpi] Possible bug in the wpi driver. Network Manager o kern/147155 net[ip6] setfb not work with ipv6 o kern/146909 net[rue] rue(4) does not detect OQO model01 network contr o kern/146845 net[libc] close(2) returns error 54 (connection reset by o kern/146792 net[flowtable] flowcleaner 100% cpu's core load o kern/146759 net[cxgb] [patch] cxgb panic calling cxgb_set_lro() witho o kern/146719 net[pf] [panic] PF or dumynet kernel panic o kern/146534 net[icmp6] wrong source address in echo reply o kern/146517 net[ath] [wlan] device timeouts for ath wlan device on re o kern/146427 net[mwl] Additional virtual access points don't work on m o kern/146426 net[mwl] 802.11n rates not possible on mwl o kern/146425 net[mwl] mwl dropping all packets during and after high u f kern/146394 net[vlan] IP source address for outgoing connections o bin/146377 net[ppp] [tun] Interface doesn't clear addresses when PPP o kern/146358 net[vlan] wrong destination MAC address o kern/146165 net[wlan] [panic] Setting bssid in adhoc mode causes pani o kern/146082 net[ng_l2tp] a false invaliant check was performed in ng_ o kern/146037 net[panic] mpd + CoA = kernel panic o bin/145934 net[patch] add count option to netstat(1) o kern/145826 net[ath] Unable to configure adhoc mode on ath0/wlan0 o kern/145825 net[panic] panic: soabort: so_count o kern/145777 net[wpi] Intel 3945ABG driver breaks the connection after o kern/145728 net[lagg] Stops working lagg between two servers. o amd64/145654 netamd64-curent memory leak in kernel o kern/144987 net[wpi] [panic] injecting packets with wlaninject using o kern/144882 netMacBookPro =>4.1 does not connect to BSD in hostap wit o kern/144874 net[if_bridge] [patch] if_bridge frees mbuf after pfil ho o conf/144700 net[rc.d] async dhclient breaks stuff for too many people o kern/144642 net[rum] [panic] Enabling rum interface causes panic o kern/144616 net[nat] [panic] ip_nat panic FreeBSD 7.2 o kern/144572 net[carp] CARP preemption mode traffic partially goes to f kern/144315 net[ipfw] [panic] freebsd 8-stable reboot after add ipfw o kern/143939 net[ipfw] [em] ipfw nat and em interface rxcsum problem o kern/143874 net[wpi] Wireless 3945ABG error. wpi0 could not allocate o kern/143868 net[ath] [patch] [request] allow Atheros watchdog timeout
TCP loopback socket fusing
When a TCP connection via loopback back to localhost is made the whole send, segmentation and receive path (with larger packets though) is still executed. This has some considerable overhead. To short-circuit the send and receive sockets on localhost TCP connections I've made a proof-of-concept patch that directly places the data in the other side's socket buffer without doing any packetization and other protocol overhead (like UNIX domain sockets). The connections setup (SYN, SYN-ACK, ACK) and shutdown are still handled by normal TCP segments via loopback so that firewalling stills works. The actual payload data during the session won't be seen and the sequence numbers don't move other than for SYN and FIN. The sequence are remain valid though. Obviously tcpdump won't see any data transfers either if the connection has fused sockets. Preliminary testing (with WITNESS and INVARIANTS enabled) has shown stable operation and a rough doubling of the throughput on loopback connections. I've tested most socket teardown cases and it behaves fine. I'm not entirely sure I've got all possible path's but the way it is integrated should properly defuse the sockets in all situations. Testers and feedback wanted: http://people.freebsd.org/~andre/tcp_loopfuse-20100913.diff -- Andre ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Removal of deprecated implied connect for TCP
Hi, On 2010-8-29, at 16:22, Andre Oppermann wrote: > T/TCP was ill-defined and had major security issues and never gained > any support. It has been defunct in FreeBSD and most code has been > removed about 6 years ago. we're also about to declare the T/TCP RFCs Historic. See http://tools.ietf.org/html/draft-eggert-tcpm-historicize (which is a work item in the TCPM working group despite not being a draft-ietf-... at the moment.) Lars___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: TCP loopback socket fusing
On 13.09.2010 14:45, Poul-Henning Kamp wrote: In message<4c8e0c1e.2020...@networx.ch>, Andre Oppermann writes: To short-circuit the send and receive sockets on localhost TCP connections I've made a proof-of-concept patch that directly places the data in the other side's socket buffer without doing any packetization and other protocol overhead [...] Can we keep the option (sysctl ?) of doing the full packet thing, it is a very convenient debugging tool... Yes, an appropriate sysctl is already contained in the patch (w/o man page documentation yet). -- Andre ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: TCP loopback socket fusing
In message <4c8e0c1e.2020...@networx.ch>, Andre Oppermann writes: >To short-circuit the send and receive sockets on localhost TCP connections >I've made a proof-of-concept patch that directly places the data in the >other side's socket buffer without doing any packetization and other protocol >overhead [...] Can we keep the option (sysctl ?) of doing the full packet thing, it is a very convenient debugging tool... -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
What about net.isr ?
Hi all. I have a dual Intel Xeon E5506 box running mpd5, dummynet and pf. Sometimes i get about 500+ pppoe connections to this machine, the network traffic goes to 30mbps and CPU usage hits 100%. I would like to know if netisr would help me using the other processor cores, and where I can get docs about it. My network card is a dual port Broadcom NetXtreme II BCM5709. Thanks in advance ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: bge hangs on recent 7.3-STABLE
On Fri, Sep 10, 2010 at 07:39:15AM +0400, Igor Sysoev wrote: > On Thu, Sep 09, 2010 at 01:10:50PM -0700, Pyun YongHyeon wrote: > > > On Thu, Sep 09, 2010 at 02:28:26PM +0400, Igor Sysoev wrote: > > > Hi, > > > > > > I have several hosts running FreeBSD/amd64 7.2-STABLE updated on > > > 11.01.2010 > > > and 25.02.2010. Hosts process about 10K input and 10K output packets/s > > > without issues. One of them, however, is loaded more than others, so it > > > processes 20K/20K packets/s. > > > > > > Recently, I have upgraded one host to 7.3-STABLE, 24.08.2010. > > > Then bge on this host hung two times. I was able to restart it from > > > console using: > > > /etc/rc.d/netif restart bge0 > > > > > > Then I have upgraded the most loaded (20K/20K) host to 7.3-STABLE, > > > 07.09.2010. > > > After reboot bge hung every several seconds. I was able to restart it, > > > but bge hung again after several seconds. > > > > > > Then I have downgraded this host to 7.3-STABLE, 14.08.2010, since there > > > were several if_bge.c commits on 15.08.2010. The same hangs. > > > Then I have downgraded this host to 7.3-STABLE, 17.03.2010, before > > > the first if_bge.c commit after 25.02.2010. Now it runs without hangs. > > > > > > The hosts are amd64 dual core SMP with 4G machines. bge information: > > > > > > b...@pci0:4:0:0:class=0x02 card=0x165914e4 chip=0x165914e4 > > > rev=0x11 hdr=0x00 > > > vendor = 'Broadcom Corporation' > > > device = 'NetXtreme Gigabit Ethernet PCI Express (BCM5721)' > > > > > > bge0: > > 0x004101> mem 0xfe5f-0xfe5f irq 19 at device 0.0 on pci4 > > > miibus1: on bge0 > > > brgphy0: PHY 1 on miibus1 > > > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > > > 1000baseT-FDX, auto > > > bge0: Ethernet address: 00:e0:81:5f:6e:8a > > > > > > > Could you show me verbose boot message(bge part only)? > > Also show me the output of "pciconf -lcbv". > > Here is "pciconf -lcbv", I will send the "boot -v" part later. > > b...@pci0:4:0:0: class=0x02 card=0x165914e4 chip=0x165914e4 rev=0x11 > hdr=0x00 > vendor = 'Broadcom Corporation' > device = 'NetXtreme Gigabit Ethernet PCI Express (BCM5721)' > class = network > subclass = ethernet > bar [10] = type Memory, range 64, base 0xfe5f, size 65536, enabled > cap 01[48] = powerspec 2 supports D0 D3 current D0 > cap 03[50] = VPD > cap 05[58] = MSI supports 8 messages, 64 bit > cap 10[d0] = PCI-Express 1 endpoint max data 128(128) link x1(x1) Sorry for delay. Here is "boot -v" part. It is from other host, but the host hungs too: pci4: on pcib4 pci4: domain=0, physical bus=4 found-> vendor=0x14e4, dev=0x1659, revid=0x11 domain=0, bus=4, slot=0, func=0 class=02-00-00, hdrtype=0x00, mfdev=0 cmdreg=0x0006, statreg=0x0010, cachelnsz=8 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=a, irq=5 powerspec 2 supports D0 D3 current D0 MSI supports 8 messages, 64 bit map[10]: type Memory, range 64, base 0xfe5f, size 16, enabled pcib4: requested memory range 0xfe5f-0xfe5f: good pcib0: matched entry for 0.13.INTA (src \_SB_.PCI0.APC4:0) pcib0: slot 13 INTA routed to irq 19 via \_SB_.PCI0.APC4 pcib4: slot 0 INTA is routed to irq 19 pci0:4:0:0: bad VPD cksum, remain 14 bge0: mem 0 xfe5f-0xfe5f irq 19 at device 0.0 on pci4 bge0: Reserved 0x1 bytes for rid 0x10 type 3 at 0xfe5f bge0: CHIP ID 0x4101; ASIC REV 0x04; CHIP REV 0x41; PCI-E miibus1: on bge0 brgphy0: PHY 1 on miibus1 brgphy0: OUI 0x000818, model 0x0018, rev. 0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto bge0: bpf attached bge0: Ethernet address: 00:e0:81:5c:64:85 ioapic0: routing intpin 19 (PCI IRQ 19) to vector 54 bge0: [MPSAFE] bge0: [ITHREAD] -- Igor Sysoev http://sysoev.ru/en/ ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: bge hangs on recent 7.3-STABLE
On Thu, Sep 09, 2010 at 02:18:08PM -0700, Pyun YongHyeon wrote: > On Thu, Sep 09, 2010 at 01:10:50PM -0700, Pyun YongHyeon wrote: > > On Thu, Sep 09, 2010 at 02:28:26PM +0400, Igor Sysoev wrote: > > > Hi, > > > > > > I have several hosts running FreeBSD/amd64 7.2-STABLE updated on > > > 11.01.2010 > > > and 25.02.2010. Hosts process about 10K input and 10K output packets/s > > > without issues. One of them, however, is loaded more than others, so it > > > processes 20K/20K packets/s. > > > > > > Recently, I have upgraded one host to 7.3-STABLE, 24.08.2010. > > > Then bge on this host hung two times. I was able to restart it from > > > console using: > > > /etc/rc.d/netif restart bge0 > > > > > > Then I have upgraded the most loaded (20K/20K) host to 7.3-STABLE, > > > 07.09.2010. > > > After reboot bge hung every several seconds. I was able to restart it, > > > but bge hung again after several seconds. > > > > > > Then I have downgraded this host to 7.3-STABLE, 14.08.2010, since there > > > were several if_bge.c commits on 15.08.2010. The same hangs. > > > Then I have downgraded this host to 7.3-STABLE, 17.03.2010, before > > > the first if_bge.c commit after 25.02.2010. Now it runs without hangs. > > > > > > The hosts are amd64 dual core SMP with 4G machines. bge information: > > > > > > b...@pci0:4:0:0:class=0x02 card=0x165914e4 chip=0x165914e4 > > > rev=0x11 hdr=0x00 > > > vendor = 'Broadcom Corporation' > > > device = 'NetXtreme Gigabit Ethernet PCI Express (BCM5721)' > > > > > > bge0: > > 0x004101> mem 0xfe5f-0xfe5f irq 19 at device 0.0 on pci4 > > > miibus1: on bge0 > > > brgphy0: PHY 1 on miibus1 > > > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > > > 1000baseT-FDX, auto > > > bge0: Ethernet address: 00:e0:81:5f:6e:8a > > > > > > > Could you show me verbose boot message(bge part only)? > > Also show me the output of "pciconf -lcbv". > > > > Forgot to send a patch. Let me know whether attached patch fixes > the issue or not. > Index: sys/dev/bge/if_bge.c > === > --- sys/dev/bge/if_bge.c (revision 212341) > +++ sys/dev/bge/if_bge.c (working copy) > @@ -3386,9 +3386,11 @@ > sc->bge_rx_saved_considx = rx_cons; > bge_writembx(sc, BGE_MBX_RX_CONS0_LO, sc->bge_rx_saved_considx); > if (stdcnt) > - bge_writembx(sc, BGE_MBX_RX_STD_PROD_LO, sc->bge_std); > + bge_writembx(sc, BGE_MBX_RX_STD_PROD_LO, (sc->bge_std + > + BGE_STD_RX_RING_CNT - 1) % BGE_STD_RX_RING_CNT); > if (jumbocnt) > - bge_writembx(sc, BGE_MBX_RX_JUMBO_PROD_LO, sc->bge_jumbo); > + bge_writembx(sc, BGE_MBX_RX_JUMBO_PROD_LO, (sc->bge_jumbo + > + BGE_JUMBO_RX_RING_CNT - 1) % BGE_JUMBO_RX_RING_CNT); > #ifdef notyet > /* >* This register wraps very quickly under heavy packet drops. Thank you, it seems the patch has fixed the bug. BTW, I noticed the same hungs on FreeBSD 8.1, date=2010.09.06.23.59.59 I will apply the patch on all my updated hosts. -- Igor Sysoev http://sysoev.ru/en/ ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: bce(4) - com_no_buffers (Again)
On 09/09/2010 07:24 PM, Pyun YongHyeon wrote: > On Thu, Sep 09, 2010 at 03:58:30PM -0500, Tom Judge wrote: > >> Hi, >> I am just following up on the thread from March (I think) about this issue. >> >> We are seeing this issue on a number of systems running 7.1. >> >> The systems in question are all Dell: >> >> * R710 R610 R410 >> * PE2950 >> >> The latter do not show the issue as much as the R series systems. >> >> The cards in one of the R610's that I am testing with are: >> >> b...@pci0:1:0:0:class=0x02 card=0x02361028 chip=0x163914e4 >> rev=0x20 hdr=0x00 >> vendor = 'Broadcom Corporation' >> device = 'NetXtreme II BCM5709 Gigabit Ethernet' >> class = network >> subclass = ethernet >> >> They are connected to Dell PowerConnect 5424 switches. >> >> uname -a: >> FreeBSD bandor.chi-dc.mintel.ad 7.1-RELEASE-p4 FreeBSD 7.1-RELEASE-p4 >> #3: Wed Sep 8 08:19:03 UTC 2010 >> t...@dev-tj-7-1-amd64.chicago.mintel.ad:/usr/obj/usr/src/sys/MINTELv10 amd64 >> >> We are also using 8192 byte jumbo frames, if_lagg and if_vlan in the >> configuration (the nics are in promisc as we are currently capturing >> netflow data on another vlan for diagnostic purposes. ): >> >> >> >> I have updated the bce driver and the Broadcomm MII driver to the >> version from stable/7 and am still seeing the issue. >> >> This morning I did a test with increasing the RX_PAGES to 8 but the >> system just hung starting the network. The route command got stuck in a >> zone state (Sorry can't remember exactly which). >> >> The real question is, how do we go about increasing the number of RX >> BDs? I guess we have to bump more that just RX_PAGES... >> >> >> The cause for us, from what we can see, is the openldap server sending >> large group search results back to nss_ldap or pam_ldap. When it does >> this it seems to send each of the 600 results in its own TCP segment >> creating a small packet storm (600*~100byte PDU's) at the destination >> host. The kernel then retransmits 2 blocks of 100 results each after >> SACK kicks in for the data that was dropped by the NIC. >> >> >> Thanks in advance >> >> Tom >> >> >> > FW may drop incoming frames when it does not see available RX > buffers. Increasing number of RX buffers slightly reduce the > possibility of dropping frames but it wouldn't completely fix it. > Alternatively driver may tell available RX buffers in the middle > of RX ring processing instead of giving updated buffers at the end > of RX processing. This way FW may see available RX buffers while > driver/upper stack is busy to process received frames. But this may > introduce coherency issues because the RX ring is shared between > host and FW. If FreeBSD has way to sync partial region of a DMA > map, this could be implemented without fear of coherency issue. > Another way to improve RX performance would be switching to > multi-RX queue with RSS but that would require a lot of work and I > had no time to implement it. > Does this mean that these cards are going to perform badly? This is was what I gathered from the previous thread. > BTW, given that you've updated to bce(4)/mii(4) of stable/7, I > wonder why TX/RX flow controls were not kicked in. > The working copy I used for grabbing the upstream source is at r212371. Last changes for the directories in my working copy: sys/dev/bce @ 211388 sys/dev/mii @ 212020 I discovered that flow control was disabled on the switches, so I set it to auto and added a pair of BCE_PRINTF's in the code where it enables and disables flow control and now it gets enabled. Without BCE_JUMBO_HDRSPLIT then we see no errors. With it we see number of errors, however the rate seems to be reduced compaired to the previous version of the driver. Tom -- TJU13-ARIN ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: What about net.isr ?
On Mon, 2010-09-13 at 10:18 -0300, Marcos Vinícius Buzo wrote: > Hi all. > > I have a dual Intel Xeon E5506 box running mpd5, dummynet and pf. Sometimes > i get about 500+ pppoe connections to this machine, the network traffic goes > to 30mbps and CPU usage hits 100%. I would like to know if netisr would help > me using the other processor cores, and where I can get docs about it. > My network card is a dual port Broadcom NetXtreme II BCM5709. > > Thanks in advance > ___ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" In case of pppoe server netisr does not gave me any benefits, it tries to handle all traffic with one thread. Have no idea why, maybe it was my mistake in setup. Check what your broadcom card can do, if it have msi-x with multiple vectors it is better to setup number of vectors to number of cores (maybe number of cores -1) and set sysctls net.isr.direct=1 and net.isr.direct_force=1. In this case traffic processing will be divided between cpu cores and your router will feel much better. I'm using intel network cards on single xeon 5620 for pppoe + dummynet + nat, box handles up to 70 kpps traffic 800+ connections. Also, megabit/s does not matter for routers, packets/s is a thing that gives real cpu load. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: net.inet.tcp.slowstart_flightsize in 8-STABLE
Hello! On Fri, Aug 06, 2010 at 10:56:40AM +0200, Andre Oppermann wrote: > On 13.07.2010 16:01, Maxim Dounin wrote: > >On Wed, May 12, 2010 at 04:47:02PM +0400, Igor Sysoev wrote: > > > >>It seems that net.inet.tcp.slowstart_flightsize does not work in 8-STABLE. > >>For a long time I used slowstart_flightsize=2 on FreeBSD 4, 6, and 7 hosts. > >>However, FreeBSD-8 always starts with the single packet. > >>I saw this on different versions of 8-STABLE since 8 Oct 2009 till > >>04 Apr 2010. > > > >Finally I had some time to look into it (sorry for long delay). > > > >1. Slow start isn't used on recent FreeBSD versions for initial snd_cwnd > >calculations as long as you have rfc3390 support switched on (default since > >Jan 06 23:29:46 2004, at least in 7.*). It effectively sets initial > >snd_cwnd to 3*MSS on common networks and shouldn't cause any problems. > >Slowstart_flightsize only affects connection restarts. > > > >2. Due to bug in syncache code (patch below) all accepted connections has > >their snd_cwnd reset to 1*MSS (since r171639, 7.0+ AFAIR). > > > >3. Support for rfc3465 introduced in r187289 uncovered (2) as > >ACK to SYN/ACK no longer causes snd_cwnd increase by MSS (actually, this > >increase shouldn't happen as it's explicitly forbidden by rfc 3390, but > >it's another issue). Snd_cwnd remains really small (1*MSS + 1) and this > >causes really bad interaction with delayed acks on other side. > > > >As a workaround to delayed acks interaction problems you may disable > >rfc3465 by setting net.inet.tcp.rfc3465 to 0. Correct fix would be to apply > >the patch below. > > > >To Andre Oppermann: could you please take a look at the patch and > >commit it if found appropriate? > > I've committed your fix with svn r210666. In a few days I will MFC it back > to the stable branches. Thanks for reporting the bug and a patch for it. Andre, could you please take a look at one more patch as well? Igor reported that it still sees 100ms delays with rfc3465 turned on, and it turns out to be similar issue (setting cwnd to 1*MSS) for hosts found in hostcache. The problem with setting cwnd from hostcache was already reported here: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/92690 http://lists.freebsd.org/pipermail/freebsd-net/2007-July/014780.html As using larger cwnd from hostcache may cause problems on congested links (see second thread) I changed code to only use cached cwnd as an upper bound for cwnd (instead of fixing current code). This is also in-line with what we do on connection restarts. We may later consider re-adding usage of larger cwnd from hostcache. But I believe it should be done carefully and probably behind sysctl, off by default. # HG changeset patch # User Maxim Dounin # Date 1284352618 -14400 # Node ID bbb9fea7978b26b95e96d463238a3acd8bfb5575 # Parent 6aec795c568cf6b9d2fabf8b8b9e25ad75b053d0 Use cwnd from hostcache only as upper bound. Setting initial congestion window from hostcache wasn't working for accepted connection since introduction due to tp->snd_wnd being 0. As a result it was instead limiting cwnd on such connections to 1*MSS. With net.inet.tcp.rfc3465 enabled this results bad interaction with delayed acks and 100ms delays for hosts found in hostcache. Additionally, it's considered unsafe to use initial congestion window larger than thouse specified in RFC3390 as this may cause problems on congested links. RFC5681 says equation from RFC3390 MUST be used as upper bound. Links: http://lists.freebsd.org/pipermail/freebsd-net/2007-July/014780.html http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/92690 diff --git a/netinet/tcp_input.c b/netinet/tcp_input.c --- a/netinet/tcp_input.c +++ b/netinet/tcp_input.c @@ -3332,29 +3332,13 @@ tcp_mss(struct tcpcb *tp, int offer) tp->snd_bandwidth = metrics.rmx_bandwidth; /* -* Set the slow-start flight size depending on whether this -* is a local network or not. +* Set initial congestion window per RFC3390. Alternatively, set +* flight size depending on whether this is a local network or not. * -* Extend this so we cache the cwnd too and retrieve it here. -* Make cwnd even bigger than RFC3390 suggests but only if we -* have previous experience with the remote host. Be careful -* not make cwnd bigger than remote receive window or our own -* send socket buffer. Maybe put some additional upper bound -* on the retrieved cwnd. Should do incremental updates to -* hostcache when cwnd collapses so next connection doesn't -* overloads the path again. -* -* RFC3390 says only do this if SYN or SYN/ACK didn't got lost. -* We currently check only in syncache_socket for that. +* RFC3390 says we MUST limit initial window to one segment if SYN +* or SYN/ACK is lost. We currently check only in syncache_socket() +* for that. */ -#define TCP_M
Re: bge hangs on recent 7.3-STABLE
On Mon, Sep 13, 2010 at 06:27:08PM +0400, Igor Sysoev wrote: > On Thu, Sep 09, 2010 at 02:18:08PM -0700, Pyun YongHyeon wrote: > > > On Thu, Sep 09, 2010 at 01:10:50PM -0700, Pyun YongHyeon wrote: > > > On Thu, Sep 09, 2010 at 02:28:26PM +0400, Igor Sysoev wrote: > > > > Hi, > > > > > > > > I have several hosts running FreeBSD/amd64 7.2-STABLE updated on > > > > 11.01.2010 > > > > and 25.02.2010. Hosts process about 10K input and 10K output packets/s > > > > without issues. One of them, however, is loaded more than others, so it > > > > processes 20K/20K packets/s. > > > > > > > > Recently, I have upgraded one host to 7.3-STABLE, 24.08.2010. > > > > Then bge on this host hung two times. I was able to restart it from > > > > console using: > > > > /etc/rc.d/netif restart bge0 > > > > > > > > Then I have upgraded the most loaded (20K/20K) host to 7.3-STABLE, > > > > 07.09.2010. > > > > After reboot bge hung every several seconds. I was able to restart it, > > > > but bge hung again after several seconds. > > > > > > > > Then I have downgraded this host to 7.3-STABLE, 14.08.2010, since there > > > > were several if_bge.c commits on 15.08.2010. The same hangs. > > > > Then I have downgraded this host to 7.3-STABLE, 17.03.2010, before > > > > the first if_bge.c commit after 25.02.2010. Now it runs without hangs. > > > > > > > > The hosts are amd64 dual core SMP with 4G machines. bge information: > > > > > > > > b...@pci0:4:0:0:class=0x02 card=0x165914e4 chip=0x165914e4 > > > > rev=0x11 hdr=0x00 > > > > vendor = 'Broadcom Corporation' > > > > device = 'NetXtreme Gigabit Ethernet PCI Express (BCM5721)' > > > > > > > > bge0: > > > 0x004101> mem 0xfe5f-0xfe5f irq 19 at device 0.0 on pci4 > > > > miibus1: on bge0 > > > > brgphy0: PHY 1 on miibus1 > > > > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > > > > 1000baseT-FDX, auto > > > > bge0: Ethernet address: 00:e0:81:5f:6e:8a > > > > > > > > > > Could you show me verbose boot message(bge part only)? > > > Also show me the output of "pciconf -lcbv". > > > > > > > Forgot to send a patch. Let me know whether attached patch fixes > > the issue or not. > > > Index: sys/dev/bge/if_bge.c > > === > > --- sys/dev/bge/if_bge.c(revision 212341) > > +++ sys/dev/bge/if_bge.c(working copy) > > @@ -3386,9 +3386,11 @@ > > sc->bge_rx_saved_considx = rx_cons; > > bge_writembx(sc, BGE_MBX_RX_CONS0_LO, sc->bge_rx_saved_considx); > > if (stdcnt) > > - bge_writembx(sc, BGE_MBX_RX_STD_PROD_LO, sc->bge_std); > > + bge_writembx(sc, BGE_MBX_RX_STD_PROD_LO, (sc->bge_std + > > + BGE_STD_RX_RING_CNT - 1) % BGE_STD_RX_RING_CNT); > > if (jumbocnt) > > - bge_writembx(sc, BGE_MBX_RX_JUMBO_PROD_LO, sc->bge_jumbo); > > + bge_writembx(sc, BGE_MBX_RX_JUMBO_PROD_LO, (sc->bge_jumbo + > > + BGE_JUMBO_RX_RING_CNT - 1) % BGE_JUMBO_RX_RING_CNT); > > #ifdef notyet > > /* > > * This register wraps very quickly under heavy packet drops. > > Thank you, it seems the patch has fixed the bug. > BTW, I noticed the same hungs on FreeBSD 8.1, date=2010.09.06.23.59.59 > I will apply the patch on all my updated hosts. > Thanks for testing. I'm afraid bge(4) in HEAD, stable/8 and stable/7(including 8.1-RELEASE and 7.3-RELEASE) may suffer from this issue. Let me know what other hosts work with the patch. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: bge hangs on recent 7.3-STABLE
On Mon, Sep 13, 2010 at 11:04:47AM -0700, Pyun YongHyeon wrote: > On Mon, Sep 13, 2010 at 06:27:08PM +0400, Igor Sysoev wrote: > > On Thu, Sep 09, 2010 at 02:18:08PM -0700, Pyun YongHyeon wrote: > > > > > On Thu, Sep 09, 2010 at 01:10:50PM -0700, Pyun YongHyeon wrote: > > > > On Thu, Sep 09, 2010 at 02:28:26PM +0400, Igor Sysoev wrote: > > > > > Hi, > > > > > > > > > > I have several hosts running FreeBSD/amd64 7.2-STABLE updated on > > > > > 11.01.2010 > > > > > and 25.02.2010. Hosts process about 10K input and 10K output packets/s > > > > > without issues. One of them, however, is loaded more than others, so > > > > > it > > > > > processes 20K/20K packets/s. > > > > > > > > > > Recently, I have upgraded one host to 7.3-STABLE, 24.08.2010. > > > > > Then bge on this host hung two times. I was able to restart it from > > > > > console using: > > > > > /etc/rc.d/netif restart bge0 > > > > > > > > > > Then I have upgraded the most loaded (20K/20K) host to 7.3-STABLE, > > > > > 07.09.2010. > > > > > After reboot bge hung every several seconds. I was able to restart it, > > > > > but bge hung again after several seconds. > > > > > > > > > > Then I have downgraded this host to 7.3-STABLE, 14.08.2010, since > > > > > there > > > > > were several if_bge.c commits on 15.08.2010. The same hangs. > > > > > Then I have downgraded this host to 7.3-STABLE, 17.03.2010, before > > > > > the first if_bge.c commit after 25.02.2010. Now it runs without hangs. > > > > > > > > > > The hosts are amd64 dual core SMP with 4G machines. bge information: > > Thank you, it seems the patch has fixed the bug. > > BTW, I noticed the same hungs on FreeBSD 8.1, date=2010.09.06.23.59.59 > > I will apply the patch on all my updated hosts. > > > > Thanks for testing. I'm afraid bge(4) in HEAD, stable/8 and > stable/7(including 8.1-RELEASE and 7.3-RELEASE) may suffer from > this issue. Let me know what other hosts work with the patch. Currently I have patched two hosts only: 7.3, 24.08.2010 and 8.1, 06.09.2010. 7.3 now handles 20K/20K packets/s without issues. One host has been downgraded to 17.03.2010 as I already reported. Other hosts still run 7.x, from January and February 2010. If there not will be hangs I will upgrade other hosts and will patch them. -- Igor Sysoev http://sysoev.ru/en/ ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: bce(4) - com_no_buffers (Again)
On Mon, Sep 13, 2010 at 10:04:25AM -0500, Tom Judge wrote: > On 09/09/2010 07:24 PM, Pyun YongHyeon wrote: > > On Thu, Sep 09, 2010 at 03:58:30PM -0500, Tom Judge wrote: > > > >> Hi, > >> I am just following up on the thread from March (I think) about this issue. > >> > >> We are seeing this issue on a number of systems running 7.1. > >> > >> The systems in question are all Dell: > >> > >> * R710 R610 R410 > >> * PE2950 > >> > >> The latter do not show the issue as much as the R series systems. > >> > >> The cards in one of the R610's that I am testing with are: > >> > >> b...@pci0:1:0:0:class=0x02 card=0x02361028 chip=0x163914e4 > >> rev=0x20 hdr=0x00 > >> vendor = 'Broadcom Corporation' > >> device = 'NetXtreme II BCM5709 Gigabit Ethernet' > >> class = network > >> subclass = ethernet > >> > >> They are connected to Dell PowerConnect 5424 switches. > >> > >> uname -a: > >> FreeBSD bandor.chi-dc.mintel.ad 7.1-RELEASE-p4 FreeBSD 7.1-RELEASE-p4 > >> #3: Wed Sep 8 08:19:03 UTC 2010 > >> t...@dev-tj-7-1-amd64.chicago.mintel.ad:/usr/obj/usr/src/sys/MINTELv10 > >> amd64 > >> > >> We are also using 8192 byte jumbo frames, if_lagg and if_vlan in the > >> configuration (the nics are in promisc as we are currently capturing > >> netflow data on another vlan for diagnostic purposes. ): > >> > >> > >> > > >> I have updated the bce driver and the Broadcomm MII driver to the > >> version from stable/7 and am still seeing the issue. > >> > >> This morning I did a test with increasing the RX_PAGES to 8 but the > >> system just hung starting the network. The route command got stuck in a > >> zone state (Sorry can't remember exactly which). > >> > >> The real question is, how do we go about increasing the number of RX > >> BDs? I guess we have to bump more that just RX_PAGES... > >> > >> > >> The cause for us, from what we can see, is the openldap server sending > >> large group search results back to nss_ldap or pam_ldap. When it does > >> this it seems to send each of the 600 results in its own TCP segment > >> creating a small packet storm (600*~100byte PDU's) at the destination > >> host. The kernel then retransmits 2 blocks of 100 results each after > >> SACK kicks in for the data that was dropped by the NIC. > >> > >> > >> Thanks in advance > >> > >> Tom > >> > >> > >> > > > FW may drop incoming frames when it does not see available RX > > buffers. Increasing number of RX buffers slightly reduce the > > possibility of dropping frames but it wouldn't completely fix it. > > Alternatively driver may tell available RX buffers in the middle > > of RX ring processing instead of giving updated buffers at the end > > of RX processing. This way FW may see available RX buffers while > > driver/upper stack is busy to process received frames. But this may > > introduce coherency issues because the RX ring is shared between > > host and FW. If FreeBSD has way to sync partial region of a DMA > > map, this could be implemented without fear of coherency issue. > > Another way to improve RX performance would be switching to > > multi-RX queue with RSS but that would require a lot of work and I > > had no time to implement it. > > > > Does this mean that these cards are going to perform badly? This is was > what I gathered from the previous thread. > I mean there are still many rooms to be done in driver for better performance. bce(4) controllers are one of best controllers for servers and driver didn't take full advantage of it. > > BTW, given that you've updated to bce(4)/mii(4) of stable/7, I > > wonder why TX/RX flow controls were not kicked in. > > > > The working copy I used for grabbing the upstream source is at r212371. > > Last changes for the directories in my working copy: > > sys/dev/bce @ 211388 > sys/dev/mii @ 212020 > > > I discovered that flow control was disabled on the switches, so I set it > to auto and added a pair of BCE_PRINTF's in the code where it enables > and disables flow control and now it gets enabled. > Ok. > > Without BCE_JUMBO_HDRSPLIT then we see no errors. With it we see number > of errors, however the rate seems to be reduced compaired to the > previous version of the driver. > It seems there are issues in header splitting and it was disabled by default. Header splitting reduces packet processing overhead in upper layer so it's normal to see better performance with header splitting. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: bce(4) - com_no_buffers (Again)
On 09/13/2010 01:48 PM, Pyun YongHyeon wrote: > On Mon, Sep 13, 2010 at 10:04:25AM -0500, Tom Judge wrote: > >> >> Does this mean that these cards are going to perform badly? This is was >> what I gathered from the previous thread. >> >> > I mean there are still many rooms to be done in driver for better > performance. bce(4) controllers are one of best controllers for > servers and driver didn't take full advantage of it. > > So far our experiences with bce(4) on FreeBSD have been very disappointing. Starting with when Dell switched to bce(4) based NIC's (around the time 6.2 was released and with the introduction of the Power Edge X9XX hardware) we have always had problems with the driver in every release we have used: 6.2, 7.0 and 7.1. Luckily David has been helpful and helped us fix the issues. > >> Without BCE_JUMBO_HDRSPLIT then we see no errors. With it we see number >> of errors, however the rate seems to be reduced compaired to the >> previous version of the driver. >> >> > It seems there are issues in header splitting and it was disabled > by default. Header splitting reduces packet processing overhead in > upper layer so it's normal to see better performance with header > splitting. > The reason that we have had header splitting enabled in the past is that historically there have been issues with memory fragmentation when using 8k jumbo frames (resulting in 9k mbuf's). I have a kernel with the following configuration in testing right now: * Flow control enabled. * Jumbo header splitting turned off. Is there any way that we can fix flow control with jumbo header splitting turned on? Thanks Tom PS. The following test was more than enough to trigger buffer shortages with header splitting on: ( while true; do ldapsearch -h ldap-server1 -b "ou=Some,o=Base" dn; done ) & ( while true; do ldapsearch -h ldap-server1 -b "ou=Some,o=Base" dn; done ) & ( while true; do ldapsearch -h ldap-server1 -b "ou=Some,o=Base" dn; done ) & The search in question returned about 1700 entries. -- TJU13-ARIN ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: bce(4) - com_no_buffers (Again)
On 13.09.2010 20:48, Pyun YongHyeon wrote: On Mon, Sep 13, 2010 at 10:04:25AM -0500, Tom Judge wrote: Without BCE_JUMBO_HDRSPLIT then we see no errors. With it we see number of errors, however the rate seems to be reduced compaired to the previous version of the driver. It seems there are issues in header splitting and it was disabled by default. Header splitting reduces packet processing overhead in upper layer so it's normal to see better performance with header splitting. I'm not sure that header splitting really helps much at least for TCP. The only place where it could make a difference is at socket buffer append time. There the header get 'thrown away'. With header splitting the first mbuf in the chain containing the header can be returned to the free pool. Without header splitting it's just a offset change in the mbuf. IIRC header splitting was introduced with the Tigeon cards which were the first programmable network cards and the first to support putting the header in a different mbuf. Header splitting, in theory, could make a difference with zero copy sockets where the data portion in a separate mbuf is flipped by VM magic into userspace. The trouble is that no driver fully supports the semantics required for page flipping and the zero copy code, if compiled in, is less much less optimized for the non-flipping case than the standard code path. With the many dozen gigabyte per second memory copy bandwidth of current CPU's it remains questionable whether the page-flipping VM magic is actually faster than a plain kernel/userspace copy as in the standard code path. I generally recommend not to use ZERO_COPY_SOCKETS. I suspect in the case of the bce(4) driver the change in header splitting is probably not the cause of the performance difference. -- Andre ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: bce(4) - com_no_buffers (Again)
On 09/13/2010 02:11 PM, Andre Oppermann wrote: > On 13.09.2010 20:48, Pyun YongHyeon wrote: >> On Mon, Sep 13, 2010 at 10:04:25AM -0500, Tom Judge wrote: >>> Without BCE_JUMBO_HDRSPLIT then we see no errors. With it we see >>> number >>> of errors, however the rate seems to be reduced compaired to the >>> previous version of the driver. Please note that 'rate' here relates to the rate at which dev.bce.X.com_no_buffers is increasing not to PPS or bandwidth. However the discussion is still interesting. >>> >> >> It seems there are issues in header splitting and it was disabled >> by default. Header splitting reduces packet processing overhead in >> upper layer so it's normal to see better performance with header >> splitting. > > I'm not sure that header splitting really helps much at least for TCP. > The only place where it could make a difference is at socket buffer > append time. There the header get 'thrown away'. With header splitting > the first mbuf in the chain containing the header can be returned to the > free pool. Without header splitting it's just a offset change in the > mbuf. > > IIRC header splitting was introduced with the Tigeon cards which were > the first programmable network cards and the first to support putting > the header in a different mbuf. Header splitting, in theory, could > make a difference with zero copy sockets where the data portion in a > separate mbuf is flipped by VM magic into userspace. The trouble is > that no driver fully supports the semantics required for page flipping > and the zero copy code, if compiled in, is less much less optimized for > the non-flipping case than the standard code path. With the many dozen > gigabyte per second memory copy bandwidth of current CPU's it remains > questionable whether the page-flipping VM magic is actually faster than > a plain kernel/userspace copy as in the standard code path. I generally > recommend not to use ZERO_COPY_SOCKETS. > > I suspect in the case of the bce(4) driver the change in header splitting > is probably not the cause of the performance difference. > -- TJU13-ARIN ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: bce(4) - com_no_buffers (Again)
On Mon, Sep 13, 2010 at 02:07:58PM -0500, Tom Judge wrote: > On 09/13/2010 01:48 PM, Pyun YongHyeon wrote: > > On Mon, Sep 13, 2010 at 10:04:25AM -0500, Tom Judge wrote: > > > >> > > >> Does this mean that these cards are going to perform badly? This is was > >> what I gathered from the previous thread. > >> > >> > > I mean there are still many rooms to be done in driver for better > > performance. bce(4) controllers are one of best controllers for > > servers and driver didn't take full advantage of it. > > > > > > So far our experiences with bce(4) on FreeBSD have been very > disappointing. Starting with when Dell switched to bce(4) based NIC's > (around the time 6.2 was released and with the introduction of the Power > Edge X9XX hardware) we have always had problems with the driver in every > release we have used: 6.2, 7.0 and 7.1. Luckily David has been helpful > and helped us fix the issues. > > > > > >> Without BCE_JUMBO_HDRSPLIT then we see no errors. With it we see number > >> of errors, however the rate seems to be reduced compaired to the > >> previous version of the driver. > >> > >> > > It seems there are issues in header splitting and it was disabled > > by default. Header splitting reduces packet processing overhead in > > upper layer so it's normal to see better performance with header > > splitting. > > > > The reason that we have had header splitting enabled in the past is that > historically there have been issues with memory fragmentation when using > 8k jumbo frames (resulting in 9k mbuf's). > Yes, if you use jumbo frames, header splitting would help to reduce memory fragmentation as header splitting wouldn't allocate jumbo clusters. > I have a kernel with the following configuration in testing right now: > > * Flow control enabled. > * Jumbo header splitting turned off. > > > Is there any way that we can fix flow control with jumbo header > splitting turned on? > Flow control has nothing to do with header splitting(i.e. flow control is always enabled). > Thanks > > Tom > > PS. The following test was more than enough to trigger buffer shortages > with header splitting on: > > ( while true; do ldapsearch -h ldap-server1 -b "ou=Some,o=Base" dn; done > ) & > ( while true; do ldapsearch -h ldap-server1 -b "ou=Some,o=Base" dn; done > ) & > ( while true; do ldapsearch -h ldap-server1 -b "ou=Some,o=Base" dn; done > ) & > > The search in question returned about 1700 entries. > I can trigger this kind of buffer shortage with benchmark tools. Actually fixing header splitting is on my TODO list as well as other things but I don't know how long it would take. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: bce(4) - com_no_buffers (Again)
On Mon, Sep 13, 2010 at 09:11:25PM +0200, Andre Oppermann wrote: > On 13.09.2010 20:48, Pyun YongHyeon wrote: > >On Mon, Sep 13, 2010 at 10:04:25AM -0500, Tom Judge wrote: > >>Without BCE_JUMBO_HDRSPLIT then we see no errors. With it we see number > >>of errors, however the rate seems to be reduced compaired to the > >>previous version of the driver. > >> > > > >It seems there are issues in header splitting and it was disabled > >by default. Header splitting reduces packet processing overhead in > >upper layer so it's normal to see better performance with header > >splitting. > > I'm not sure that header splitting really helps much at least for TCP. > The only place where it could make a difference is at socket buffer > append time. There the header get 'thrown away'. With header splitting > the first mbuf in the chain containing the header can be returned to the > free pool. Without header splitting it's just a offset change in the > mbuf. > > IIRC header splitting was introduced with the Tigeon cards which were > the first programmable network cards and the first to support putting > the header in a different mbuf. Header splitting, in theory, could > make a difference with zero copy sockets where the data portion in a > separate mbuf is flipped by VM magic into userspace. The trouble is > that no driver fully supports the semantics required for page flipping > and the zero copy code, if compiled in, is less much less optimized for > the non-flipping case than the standard code path. With the many dozen > gigabyte per second memory copy bandwidth of current CPU's it remains > questionable whether the page-flipping VM magic is actually faster than > a plain kernel/userspace copy as in the standard code path. I generally > recommend not to use ZERO_COPY_SOCKETS. > > I suspect in the case of the bce(4) driver the change in header splitting > is probably not the cause of the performance difference. > I'm under the impression the header splitting in bce(4) is for LRO(opposite of TSO), not for VM magic to enable page flipping tricks. > -- > Andre ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Ramadan heureux mon cher
I am Mrs Claire Page sending you this mail from my sick bed in the hospital. Please contact my lawyer, Email:(barr_willam_fr...@lawyer.com) Je suis Mme Claire Page vous envoie ce mail de mon lit de malade à l'hôpital. S'il vous plaît communiquer avec mon avocat, Email: (barr_willam_fr...@lawyer.com) ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: bce(4) - com_no_buffers (Again)
On 09/13/2010 02:33 PM, Pyun YongHyeon wrote: > On Mon, Sep 13, 2010 at 02:07:58PM -0500, Tom Judge wrote: > >> On 09/13/2010 01:48 PM, Pyun YongHyeon wrote: >> >>> On Mon, Sep 13, 2010 at 10:04:25AM -0500, Tom Judge wrote: >>> >>> >> >> Does this mean that these cards are going to perform badly? This is was what I gathered from the previous thread. >>> I mean there are still many rooms to be done in driver for better >>> performance. bce(4) controllers are one of best controllers for >>> servers and driver didn't take full advantage of it. >>> >>> >>> >> So far our experiences with bce(4) on FreeBSD have been very >> disappointing. Starting with when Dell switched to bce(4) based NIC's >> (around the time 6.2 was released and with the introduction of the Power >> Edge X9XX hardware) we have always had problems with the driver in every >> release we have used: 6.2, 7.0 and 7.1. Luckily David has been helpful >> and helped us fix the issues. >> >> >> >>> >>> Without BCE_JUMBO_HDRSPLIT then we see no errors. With it we see number of errors, however the rate seems to be reduced compaired to the previous version of the driver. >>> It seems there are issues in header splitting and it was disabled >>> by default. Header splitting reduces packet processing overhead in >>> upper layer so it's normal to see better performance with header >>> splitting. >>> >>> >> The reason that we have had header splitting enabled in the past is that >> historically there have been issues with memory fragmentation when using >> 8k jumbo frames (resulting in 9k mbuf's). >> >> > Yes, if you use jumbo frames, header splitting would help to reduce > memory fragmentation as header splitting wouldn't allocate jumbo > clusters. > > Under testing I have yet to see a memory fragmentation issue with this driver. I follow up if/when I find a problem with this again. >> I have a kernel with the following configuration in testing right now: >> >> * Flow control enabled. >> * Jumbo header splitting turned off. >> >> >> Is there any way that we can fix flow control with jumbo header >> splitting turned on? >> >> > Flow control has nothing to do with header splitting(i.e. flow > control is always enabled). > > Sorry let me rephrase that: Is there a way to fix the RX buffer shortage issues (when header splitting is turned on) so that they are guarded by flow control. Maybe change the low watermark for flow control when its enabled? >> Thanks >> >> Tom >> >> PS. The following test was more than enough to trigger buffer shortages >> with header splitting on: >> >> ( while true; do ldapsearch -h ldap-server1 -b "ou=Some,o=Base" dn; done >> ) & >> ( while true; do ldapsearch -h ldap-server1 -b "ou=Some,o=Base" dn; done >> ) & >> ( while true; do ldapsearch -h ldap-server1 -b "ou=Some,o=Base" dn; done >> ) & >> >> The search in question returned about 1700 entries. >> >> > I can trigger this kind of buffer shortage with benchmark tools. > Actually fixing header splitting is on my TODO list as well as > other things but I don't know how long it would take. > Great to here, thanks for all the hard work. Tom -- TJU13-ARIN ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: bce(4) - com_no_buffers (Again)
On Mon, Sep 13, 2010 at 03:38:41PM -0500, Tom Judge wrote: > On 09/13/2010 02:33 PM, Pyun YongHyeon wrote: > > On Mon, Sep 13, 2010 at 02:07:58PM -0500, Tom Judge wrote: > > > >> On 09/13/2010 01:48 PM, Pyun YongHyeon wrote: > >> > >>> On Mon, Sep 13, 2010 at 10:04:25AM -0500, Tom Judge wrote: > >>> > >>> > > >> > >> > Does this mean that these cards are going to perform badly? This is was > what I gathered from the previous thread. > > > > >>> I mean there are still many rooms to be done in driver for better > >>> performance. bce(4) controllers are one of best controllers for > >>> servers and driver didn't take full advantage of it. > >>> > >>> > >>> > >> So far our experiences with bce(4) on FreeBSD have been very > >> disappointing. Starting with when Dell switched to bce(4) based NIC's > >> (around the time 6.2 was released and with the introduction of the Power > >> Edge X9XX hardware) we have always had problems with the driver in every > >> release we have used: 6.2, 7.0 and 7.1. Luckily David has been helpful > >> and helped us fix the issues. > >> > >> > >> > >>> > >>> > Without BCE_JUMBO_HDRSPLIT then we see no errors. With it we see number > of errors, however the rate seems to be reduced compaired to the > previous version of the driver. > > > > >>> It seems there are issues in header splitting and it was disabled > >>> by default. Header splitting reduces packet processing overhead in > >>> upper layer so it's normal to see better performance with header > >>> splitting. > >>> > >>> > >> The reason that we have had header splitting enabled in the past is that > >> historically there have been issues with memory fragmentation when using > >> 8k jumbo frames (resulting in 9k mbuf's). > >> > >> > > Yes, if you use jumbo frames, header splitting would help to reduce > > memory fragmentation as header splitting wouldn't allocate jumbo > > clusters. > > > > > > Under testing I have yet to see a memory fragmentation issue with this > driver. I follow up if/when I find a problem with this again. > > >> I have a kernel with the following configuration in testing right now: > >> > >> * Flow control enabled. > >> * Jumbo header splitting turned off. > >> > >> > >> Is there any way that we can fix flow control with jumbo header > >> splitting turned on? > >> > >> > > Flow control has nothing to do with header splitting(i.e. flow > > control is always enabled). > > > > > Sorry let me rephrase that: > > Is there a way to fix the RX buffer shortage issues (when header > splitting is turned on) so that they are guarded by flow control. Maybe > change the low watermark for flow control when its enabled? > I'm not sure how much it would help but try changing RX low watermark. Default value is 32 which seems to be reasonable value. But it's only for 5709/5716 controllers and Linux seems to use different default value. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: bge hangs on recent 7.3-STABLE
On Mon, Sep 13, 2010 at 9:04 PM, Pyun YongHyeon wrote: > On Mon, Sep 13, 2010 at 06:27:08PM +0400, Igor Sysoev wrote: >> On Thu, Sep 09, 2010 at 02:18:08PM -0700, Pyun YongHyeon wrote: >> >> > On Thu, Sep 09, 2010 at 01:10:50PM -0700, Pyun YongHyeon wrote: >> > > On Thu, Sep 09, 2010 at 02:28:26PM +0400, Igor Sysoev wrote: >> > > > Hi, >> > > > >> > > > I have several hosts running FreeBSD/amd64 7.2-STABLE updated on >> > > > 11.01.2010 >> > > > and 25.02.2010. Hosts process about 10K input and 10K output packets/s >> > > > without issues. One of them, however, is loaded more than others, so it >> > > > processes 20K/20K packets/s. >> > > > >> > > > Recently, I have upgraded one host to 7.3-STABLE, 24.08.2010. >> > > > Then bge on this host hung two times. I was able to restart it from >> > > > console using: >> > > > /etc/rc.d/netif restart bge0 >> > > > >> > > > Then I have upgraded the most loaded (20K/20K) host to 7.3-STABLE, >> > > > 07.09.2010. >> > > > After reboot bge hung every several seconds. I was able to restart it, >> > > > but bge hung again after several seconds. >> > > > >> > > > Then I have downgraded this host to 7.3-STABLE, 14.08.2010, since there >> > > > were several if_bge.c commits on 15.08.2010. The same hangs. >> > > > Then I have downgraded this host to 7.3-STABLE, 17.03.2010, before >> > > > the first if_bge.c commit after 25.02.2010. Now it runs without hangs. >> > > > >> > > > The hosts are amd64 dual core SMP with 4G machines. bge information: >> > > > >> > > > b...@pci0:4:0:0: class=0x02 card=0x165914e4 chip=0x165914e4 >> > > > rev=0x11 hdr=0x00 >> > > > vendor = 'Broadcom Corporation' >> > > > device = 'NetXtreme Gigabit Ethernet PCI Express (BCM5721)' >> > > > >> > > > bge0: > > > > 0x004101> mem 0xfe5f-0xfe5f irq 19 at device 0.0 on pci4 >> > > > miibus1: on bge0 >> > > > brgphy0: PHY 1 on miibus1 >> > > > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, >> > > > 1000baseT-FDX, auto >> > > > bge0: Ethernet address: 00:e0:81:5f:6e:8a >> > > > >> > > >> > > Could you show me verbose boot message(bge part only)? >> > > Also show me the output of "pciconf -lcbv". >> > > >> > >> > Forgot to send a patch. Let me know whether attached patch fixes >> > the issue or not. >> >> > Index: sys/dev/bge/if_bge.c >> > === >> > --- sys/dev/bge/if_bge.c (revision 212341) >> > +++ sys/dev/bge/if_bge.c (working copy) >> > @@ -3386,9 +3386,11 @@ >> > sc->bge_rx_saved_considx = rx_cons; >> > bge_writembx(sc, BGE_MBX_RX_CONS0_LO, sc->bge_rx_saved_considx); >> > if (stdcnt) >> > - bge_writembx(sc, BGE_MBX_RX_STD_PROD_LO, sc->bge_std); >> > + bge_writembx(sc, BGE_MBX_RX_STD_PROD_LO, (sc->bge_std + >> > + BGE_STD_RX_RING_CNT - 1) % BGE_STD_RX_RING_CNT); >> > if (jumbocnt) >> > - bge_writembx(sc, BGE_MBX_RX_JUMBO_PROD_LO, sc->bge_jumbo); >> > + bge_writembx(sc, BGE_MBX_RX_JUMBO_PROD_LO, (sc->bge_jumbo + >> > + BGE_JUMBO_RX_RING_CNT - 1) % BGE_JUMBO_RX_RING_CNT); >> > #ifdef notyet >> > /* >> > * This register wraps very quickly under heavy packet drops. >> >> Thank you, it seems the patch has fixed the bug. >> BTW, I noticed the same hungs on FreeBSD 8.1, date=2010.09.06.23.59.59 >> I will apply the patch on all my updated hosts. >> > > Thanks for testing. I'm afraid bge(4) in HEAD, stable/8 and > stable/7(including 8.1-RELEASE and 7.3-RELEASE) may suffer from > this issue. Let me know what other hosts work with the patch. Hi Pyun, Thanks for the patch. It seems to have fixed the symptom in my case, on a card identical to Igor's, but on board of an IBM eServer 306m. Regards, Vlad -- Good, fast & cheap. Pick any two. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: bce(4) - com_no_buffers (Again)
On Mon, Sep 13, 2010 at 03:21:13PM -0700, David Christensen wrote: > > I'm under the impression the header splitting in bce(4) is for > > LRO(opposite of TSO), not for VM magic to enable page flipping > > tricks. > > Header splitting was implemented in the Linux version of bce(4) > to prevent jumbo memory allocations. Allocating 9KB frames was > causing problems on systems used for virtualization. (Harder to > find a contiguous 9KB frame when a hypervisor is in use.) Using > 4KB or smaller buffer sizes was considered more compatible with > virtualization. > > LRO (Large Receive Offload, aka Transparent Packet Aggregation > or TPA on the 10Gb controllers) is not supported on the 1Gb > bce(4) devices. > I meant tcp_lro implementation of FreeBSD. ATM tcp_lro_rx() runs long list of sanity checks before combining TCP segments into a TCP segment but if TCP header is split with its payload I guess we can optimize that path. This way we may be able to support LRO over VLAN, I guess. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
RE: bce(4) - com_no_buffers (Again)
> I'm under the impression the header splitting in bce(4) is for > LRO(opposite of TSO), not for VM magic to enable page flipping > tricks. Header splitting was implemented in the Linux version of bce(4) to prevent jumbo memory allocations. Allocating 9KB frames was causing problems on systems used for virtualization. (Harder to find a contiguous 9KB frame when a hypervisor is in use.) Using 4KB or smaller buffer sizes was considered more compatible with virtualization. LRO (Large Receive Offload, aka Transparent Packet Aggregation or TPA on the 10Gb controllers) is not supported on the 1Gb bce(4) devices. Dave ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: bge hangs on recent 7.3-STABLE
On Tue, Sep 14, 2010 at 01:08:08AM +0300, Vlad Galu wrote: > On Mon, Sep 13, 2010 at 9:04 PM, Pyun YongHyeon wrote: > > On Mon, Sep 13, 2010 at 06:27:08PM +0400, Igor Sysoev wrote: > >> On Thu, Sep 09, 2010 at 02:18:08PM -0700, Pyun YongHyeon wrote: > >> > >> > On Thu, Sep 09, 2010 at 01:10:50PM -0700, Pyun YongHyeon wrote: > >> > > On Thu, Sep 09, 2010 at 02:28:26PM +0400, Igor Sysoev wrote: > >> > > > Hi, > >> > > > > >> > > > I have several hosts running FreeBSD/amd64 7.2-STABLE updated on > >> > > > 11.01.2010 > >> > > > and 25.02.2010. Hosts process about 10K input and 10K output > >> > > > packets/s > >> > > > without issues. One of them, however, is loaded more than others, so > >> > > > it > >> > > > processes 20K/20K packets/s. > >> > > > > >> > > > Recently, I have upgraded one host to 7.3-STABLE, 24.08.2010. > >> > > > Then bge on this host hung two times. I was able to restart it from > >> > > > console using: > >> > > > ? /etc/rc.d/netif restart bge0 > >> > > > > >> > > > Then I have upgraded the most loaded (20K/20K) host to 7.3-STABLE, > >> > > > 07.09.2010. > >> > > > After reboot bge hung every several seconds. I was able to restart > >> > > > it, > >> > > > but bge hung again after several seconds. > >> > > > > >> > > > Then I have downgraded this host to 7.3-STABLE, 14.08.2010, since > >> > > > there > >> > > > were several if_bge.c commits on 15.08.2010. The same hangs. > >> > > > Then I have downgraded this host to 7.3-STABLE, 17.03.2010, before > >> > > > the first if_bge.c commit after 25.02.2010. Now it runs without > >> > > > hangs. > >> > > > > >> > > > The hosts are amd64 dual core SMP with 4G machines. bge information: > >> > > > > >> > > > b...@pci0:4:0:0: ? ? ? ?class=0x02 card=0x165914e4 > >> > > > chip=0x165914e4 rev=0x11 hdr=0x00 > >> > > > ? ? vendor ? ? = 'Broadcom Corporation' > >> > > > ? ? device ? ? = 'NetXtreme Gigabit Ethernet PCI Express (BCM5721)' > >> > > > > >> > > > bge0: >> > > > 0x004101> mem 0xfe5f-0xfe5f irq 19 at device 0.0 on pci4 > >> > > > miibus1: on bge0 > >> > > > brgphy0: PHY 1 on miibus1 > >> > > > brgphy0: ?10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > >> > > > 1000baseT-FDX, auto > >> > > > bge0: Ethernet address: 00:e0:81:5f:6e:8a > >> > > > > >> > > > >> > > Could you show me verbose boot message(bge part only)? > >> > > Also show me the output of "pciconf -lcbv". > >> > > > >> > > >> > Forgot to send a patch. Let me know whether attached patch fixes > >> > the issue or not. > >> > >> > Index: sys/dev/bge/if_bge.c > >> > === > >> > --- sys/dev/bge/if_bge.c ? ?(revision 212341) > >> > +++ sys/dev/bge/if_bge.c ? ?(working copy) > >> > @@ -3386,9 +3386,11 @@ > >> > ? ? sc->bge_rx_saved_considx = rx_cons; > >> > ? ? bge_writembx(sc, BGE_MBX_RX_CONS0_LO, sc->bge_rx_saved_considx); > >> > ? ? if (stdcnt) > >> > - ? ? ? ? ? bge_writembx(sc, BGE_MBX_RX_STD_PROD_LO, sc->bge_std); > >> > + ? ? ? ? ? bge_writembx(sc, BGE_MBX_RX_STD_PROD_LO, (sc->bge_std + > >> > + ? ? ? ? ? ? ? BGE_STD_RX_RING_CNT - 1) % BGE_STD_RX_RING_CNT); > >> > ? ? if (jumbocnt) > >> > - ? ? ? ? ? bge_writembx(sc, BGE_MBX_RX_JUMBO_PROD_LO, sc->bge_jumbo); > >> > + ? ? ? ? ? bge_writembx(sc, BGE_MBX_RX_JUMBO_PROD_LO, (sc->bge_jumbo + > >> > + ? ? ? ? ? ? ? BGE_JUMBO_RX_RING_CNT - 1) % BGE_JUMBO_RX_RING_CNT); > >> > ?#ifdef notyet > >> > ? ? /* > >> > ? ? ?* This register wraps very quickly under heavy packet drops. > >> > >> Thank you, it seems the patch has fixed the bug. > >> BTW, I noticed the same hungs on FreeBSD 8.1, date=2010.09.06.23.59.59 > >> I will apply the patch on all my updated hosts. > >> > > > > Thanks for testing. I'm afraid bge(4) in HEAD, stable/8 and > > stable/7(including 8.1-RELEASE and 7.3-RELEASE) may suffer from > > this issue. Let me know what other hosts work with the patch. > > Hi Pyun, > > Thanks for the patch. It seems to have fixed the symptom in my case, > on a card identical to Igor's, but on board of an IBM eServer 306m. > Thanks for reporting and testing! I really appreciate it. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: FreeBSD route tables limited 16?
On 9/13/10 5:18 PM, Dave Seddon wrote: Greetings Julian, I've been wondering if it's possible to increase the number of FreeBSD route tables to a larger number. It seems this is currently 4 bits, however I was wondering about perhaps 16 bits? Yes the code is designed to handle many more and if you do create more then everything SHOULD handle it. The bottleneck is that we need to store an associated fib with each outgoing (or for that matter incoming) packet, bit we do not at this time want to dedicate a whole word in the mbuf to the task. My "hack" for 8.x (before it was done) was to hide the information in the flags word of the mbuf. I only took 4 bits to make sure I didn't trample on other people's use of bits there. The plan is/was to make a separate entry in the mbuf some time after 7.x branched (say, "now" for example :-) ) you could just steal more bits for now, but if you take 8 bits there will only be one spare. (see /sys/sys/mbuf.h) It may just be time to bite the bullet and steal the entry. Out of curiosity, why do you need > 16 fibs? have you considered using vnet jails a well? /* MRT compile-time constants */ #ifdef _KERNEL #ifndef ROUTETABLES #define RT_NUMFIBS 1 #define RT_MAXFIBS 1 #else /* while we use 4 bits in the mbuf flags, we are limited to 16 */ #define RT_MAXFIBS 16 #if ROUTETABLES> RT_MAXFIBS #define RT_NUMFIBS RT_MAXFIBS #error "ROUTETABLES defined too big" #else #if ROUTETABLES == 0 #define RT_NUMFIBS 1 #else #define RT_NUMFIBS ROUTETABLES #endif #endif #endif #endif Really liked your announcement years ago: http://lists.freebsd.org/pipermail/freebsd-arch/2007-December/007331.html Kind regards, Dave Seddon +61 447 SEDDON d...@seddon.ca -Original Message- From: Andrew Hannam To: d...@seddon.ca Subject: RE: FreeBSD route tables - limited to 16 :( Date: Mon, 13 Sep 2010 15:55:47 +1000 Mailer: Microsoft Office Outlook 12.0 I think the gentleman is confusing route-tables with routes. 150K routes is easily possible but it is obvious there is currently only support for up to 16 route tables. I think that you are right and the number of bits will need to be updated. I don't know the answer to the 'route leaking' question and it has been a long time since I looked at this code. You really need to speaking the specialist responsible for the multiple route table code. This person should be clearly marked in the code headers. I'm guessing that no-one has thought about using it the way you are planning to use it. If I get some time I will have a look - but don't hold your breath. Regards, Andrew. -Original Message- From: Dave Seddon [mailto:d...@seddon.ca] Sent: Saturday, 11 September 2010 12:52 AM To: Aldous, Matthew D Cc: d...@seddon.ca; Andrew Hannam; Truman Boyes Subject: RE: FreeBSD route tables - limited to 16 :( Greetings, I'm guessing we need to adjust the number of bits defined for the route table in the mbufs structure definition (where ever that is), then we can update the route.h to match. I guess really we should make the mbufs codes _and_ route.h code pickup the KERNCONF definition of the variable ROUTETABLES. Andrew - thoughts on this? I'm not sure if the firewall rules allow you to update the route table variable in the mbuf, but if it doesn't we should allow this. This would be equivelant to what they call 'route leaking' in MPLS speak, when you can pop traffic from one VPN to another (very nasty, but sometimes handy). yes ipfw does allow you to do this but it needs some more work.. It only really works as the naive user may expect on incoming packets. Regards, Dave On Fri, 2010-09-10 at 19:05 +1000, Aldous, Matthew D wrote: From: Dave Seddon [d...@seddon.ca] Sent: Friday, 10 September 2010 6:36 PM To: Andrew Hannam Cc: d...@seddon.ca; Aldous, Matthew D; Truman Boyes Subject: FreeBSD route tables - limited to 16 :( I just tried compiling up FreeBSD 8.1 with 1024 route tables. It's throwing an error, which is tracked down to the vi /usr/src/sys/net/route.h (line 99ish). The limit is 16, because as the comments say this is 4 bits. Need to look into increasing this to say 16 bits :). Given each mbuf will have this, it could cause a significant increase in memory usage for a system with a large number of packets (although who cares, ram is cheap). /* MRT compile-time constants */ #ifdef _KERNEL #ifndef ROUTETABLES #define RT_NUMFIBS 1 #define RT_MAXFIBS 1 #else /* while we use 4 bits in the mbuf flags, we are limited to 16 */ #define RT_MAXFIBS 16 #if ROUTETABLES> RT_MAXFIBS #define RT_NUMFIBS RT_MAXFIBS #error "ROUTETABLES defined too big" #else #if ROUTETABLES == 0 #define RT_NUMFIBS 1 #else #define RT_NUMFIBS ROUTETABLES #endif #endif #endif #endif ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.or