Re: if_run in hostap mode: issue with stations in the power save mode
Hi! On 07.02.2011 09:11:02 +0100, Bernhard Schmidt wrote: > For example, if you call 'ifconfig wlan0 ssid ' the new ssid is > passed over using a IOCTL. It would be interesting to know what function > in net80211 are called regarding beacon updates and which of those call > into the run driver. Ultimately it's about figuring out if special > handling for such cases are required and if so, how to do it. I've added a debug output on allocation, changing and deallocation of a beacon into if_run.c and tried to change SSID while the net.wlan.0.debug is -1. Here is the log contents: kernel: wlan0: ieee80211_init kernel: wlan0: start running, 1 vaps running kernel: wlan0: ieee80211_new_state_locked: RUN -> SCAN (nrunning 0 nscanning 0) kernel: wlan0: ieee80211_newstate_cb: RUN -> INIT arg 0 kernel: wlan0: hostap_newstate: RUN -> INIT (0) kernel: wlan0: node_reclaim: remove 0xff8003bd7000<00:14:d1:a8:66:1d> from station table, refcnt 1 kernel: wlan0: ieee80211_alloc_node 0xff8004eae000<00:14:d1:a8:66:1d> in station table kernel: wlan0: [00:14:d1:a8:66:1d] ieee80211_alloc_node: inact_reload 2 kernel: wlan0: ieee80211_newstate_cb: INIT -> SCAN arg 0 kernel: wlan0: hostap_newstate: INIT -> SCAN (0) kernel: wlan0: ieee80211_create_ibss: creating HOSTAP on channel 6 kernel: wlan0: ieee80211_alloc_node 0xff8003bd7000<00:14:d1:a8:66:1d> in station table kernel: kernel: wlan0: [00:14:d1:a8:66:1d] ieee80211_alloc_node: inact_reload 2 kernel: wlan0: set WME_AC_BE (chan) [acm 0 aifsn 3 logcwmin 4 logcwmax 6 txop 0] kernel: wlan0: set WME_AC_BE (bss ) [acm 0 aifsn 3 logcwmin 4 logcwmax 10 txop 0] kernel: wlan0: set WME_AC_BK (chan) [acm 0 aifsn 7 logcwmin 4 logcwmax 10 txop 0] kernel: wlan0: set WME_AC_BK (bss ) [acm 0 aifsn 7 logcwmin 4 logcwmax 10 txop 0] kernel: wlan0: set WME_AC_VI (chan) [acm 0 aifsn 1 logcwmin 3 logcwmax 4 txop 94] kernel: wlan0: set WME_AC_VI (bss ) [acm 0 aifsn 2 logcwmin 3 logcwmax 4 txop 94] kernel: wlan0: set WME_AC_VO (chan) [acm 0 aifsn 1 logcwmin 2 logcwmax 3 txop 47] kernel: wlan0: set WME_AC_VO (bss ) [acm 0 aifsn 2 logcwmin 2 logcwmax 3 txop 47] kernel: wlan0: ieee80211_wme_updateparams_locked: WME params updated, cap_info 0x6 kernel: wlan0: ieee80211_new_state_locked: SCAN -> RUN (nrunning 0 nscanning 0) kernel: wlan0: ieee80211_newstate_cb: SCAN -> RUN arg -1 kernel: run0: run_update_beacon_cb: updating beacon kernel: wlan0: ieee80211_beacon_update: traffic 0, enable aggressive mode kernel: wlan0: update WME_AC_BE (chan+bss) [acm 0 aifsn 2 logcwmin 4 logcwmax 10 txop 0] kernel: wlan0: update WME_AC_BE (chan+bss) logcwmin 3 kernel: wlan0: ieee80211_wme_updateparams_locked: WME params updated, cap_info 0x7 kernel: wlan0: hostap_newstate: SCAN -> RUN (-1) kernel: wlan0: synchronized with 00:14:d1:a8:66:1d ssid "test" channel 6 start 0Mb kernel: wlan0: [00:14:d1:a8:66:1d] ieee80211_node_authorize: inact_reload 20 As you can see, run_update_beacon_cb() is invoked, but at this time the beacon is already allocated. As the beacon is allocated, run_update_beacon_cb() invokes ieee80211_beacon_update(). As we know, the ieee80211_beacon_update() doesn't update the SSID, so the SSID remains untouched. Nevertheless the changing or hiding/unhiding a SSID seems to be working. It is possible to explain: the station uses an active scan. The ieee80211_send_proberesp()/ieee80211_alloc_proberesp() returns the frame, containing an updated SSID, but AP continues to broadcast beacon with the outdated data. The possible solution is to deallocate a beacon on a state change. I've decided to deallocate a beacon on 'to RUN' state transition. The additional patch is attached. I'll do an additional tests later today... -- Alexander Zagrebin --- /sys/dev/usb/wlan/if_run.c.orig 2011-02-08 09:52:18.994743647 +0300 +++ /sys/dev/usb/wlan/if_run.c 2011-02-08 11:04:17.114484851 +0300 @@ -1793,6 +1793,12 @@ run_newstate(struct ieee80211vap *vap, e sc->runbmap |= bid; } + if (rvp->beacon_mbuf) { + m_freem(rvp->beacon_mbuf); + rvp->beacon_mbuf = NULL; + } + switch (vap->iv_opmode) { case IEEE80211_M_HOSTAP: case IEEE80211_M_MBSS: ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Fwd: igb driver RX (was TX) hangs when out of mbuf clusters
Hello, Karim. You wrote 8 февраля 2011 г., 6:29:53: > Precisely, the exact same behavior happens (RX hang) if options > DEVICE_POLLING is _not_ used in the kernel configuration file. I tried with > POLLING since someone mentioned that it helped in a case mentioned earlier > today. Unfortunately for igb with or without polling yields the same rx ring > filing problem. In my case (em(4), not igb(4) but symptoms are VERY similar) POLLING (both as kernel option AND "ifconfig em0 polling") options leads to resets (which drops all connections!) AFTER such kernel messages: em0: Watchdog timeout -- resetting em0: Queue(0) tdh = 1302, hw tdt = 1265 em0: TX(0) desc avail = 31,Next TX to Clean = 1296 -- // Black Lion AKA Lev Serebryakov ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
ipfw, ipv6 and gif(4)
Hi. I'm running FreeBSD 8.1-STABLE (I had major issues with em(4) on 8.1-RELEASE, so I had to upgrade this host to more recent STABLE). I'm using ipv6-over-ipv4 tunnel. gif0: flags=8051 metric 0 mtu 1280 tunnel inet 89.250.210.67 --> 216.66.80.26 inet6 2001:470:1f08:14c0::2 --> 2001:470:1f08:14c0::1 prefixlen 128 nd6 options=3 options=1 In order it to work I have to allow ipv4 packets between these two hosts: (and these are two first rules in the filter) 5 14 1072 allow log ip4 from 89.250.210.67 to 216.66.80.26 out via vlan104 6 14 1072 allow log ip4 from 216.66.80.26 to 89.250.210.67 in via vlan104 The thing is, normally (at least in ipv4 world) I would have to allow ipencap packets between these hosts (and that's what I did first thing), but this configuraion never worked. I've even added 'allow' strings for every type of encapsulation from /etc/protocols, just to see their counters never changed from zero. Those two rules above were made after 'ok, let's allow everything just to see in log what does it want' decision. I want to ask - why ip4 ? And the log looks even more weird: %ping6 2001:470:1f08:14c0::1 PING6(56=40+8+8 bytes) 2001:470:1f08:14c0::2 --> 2001:470:1f08:14c0::1 16 bytes from 2001:470:1f08:14c0::1, icmp_seq=0 hlim=64 time=93.917 ms 16 bytes from 2001:470:1f08:14c0::1, icmp_seq=1 hlim=64 time=93.307 ms Feb 8 13:56:48 ns kernel: ipfw: 5 Accept P:41 89.250.210.67 216.66.80.26 out via vlan104 Feb 8 13:56:48 ns kernel: ipfw: 6 Accept P:41 216.66.80.26 89.250.210.67 in via vlan104 Feb 8 13:56:49 ns kernel: ipfw: 5 Accept P:41 89.250.210.67 216.66.80.26 out via vlan104 Feb 8 13:56:49 ns kernel: ipfw: 6 Accept P:41 216.66.80.26 89.250.210.67 in via vlan104 As you can see, P:41 is IPv6: %grep 41 /etc/protocols ipv641 IPV6# ipv6 And, of course, ipfw doesn't allow me to create the rules it is actually logging: %ipfw add 7 allow 41 from 216.66.80.26 to 89.250.210.67 in via vlan104 ipfw: bad address "216.66.80.26" Do I misunderstand the concept, or is it how it really should look ? Thanks. Eugene. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: if_run in hostap mode: issue with stations in the power save mode
On Tuesday, February 08, 2011 09:24:29 Alexander Zagrebin wrote: > Hi! > > On 07.02.2011 09:11:02 +0100, Bernhard Schmidt wrote: > > For example, if you call 'ifconfig wlan0 ssid ' the new > > ssid is passed over using a IOCTL. It would be interesting to know > > what function in net80211 are called regarding beacon updates and > > which of those call into the run driver. Ultimately it's about > > figuring out if special handling for such cases are required and > > if so, how to do it. > > I've added a debug output on allocation, changing and deallocation of > a beacon into if_run.c and tried to change SSID while the > net.wlan.0.debug is -1. Here is the log contents: > > kernel: wlan0: ieee80211_init > kernel: wlan0: start running, 1 vaps running > kernel: wlan0: ieee80211_new_state_locked: RUN -> SCAN (nrunning 0 > nscanning 0) kernel: wlan0: ieee80211_newstate_cb: RUN -> INIT arg 0 > kernel: wlan0: hostap_newstate: RUN -> INIT (0) > kernel: wlan0: node_reclaim: remove > 0xff8003bd7000<00:14:d1:a8:66:1d> from station table, refcnt 1 > kernel: wlan0: ieee80211_alloc_node > 0xff8004eae000<00:14:d1:a8:66:1d> in station table kernel: > wlan0: [00:14:d1:a8:66:1d] ieee80211_alloc_node: inact_reload 2 > kernel: wlan0: ieee80211_newstate_cb: INIT -> SCAN arg 0 > kernel: wlan0: hostap_newstate: INIT -> SCAN (0) > kernel: wlan0: ieee80211_create_ibss: creating HOSTAP on channel 6 > kernel: wlan0: ieee80211_alloc_node > 0xff8003bd7000<00:14:d1:a8:66:1d> in station table kernel: > kernel: wlan0: [00:14:d1:a8:66:1d] ieee80211_alloc_node: inact_reload > 2 kernel: wlan0: set WME_AC_BE (chan) [acm 0 aifsn 3 logcwmin 4 > logcwmax 6 txop 0] kernel: wlan0: set WME_AC_BE (bss ) [acm 0 aifsn > 3 logcwmin 4 logcwmax 10 txop 0] kernel: wlan0: set WME_AC_BK (chan) > [acm 0 aifsn 7 logcwmin 4 logcwmax 10 txop 0] kernel: wlan0: set > WME_AC_BK (bss ) [acm 0 aifsn 7 logcwmin 4 logcwmax 10 txop 0] > kernel: wlan0: set WME_AC_VI (chan) [acm 0 aifsn 1 logcwmin 3 > logcwmax 4 txop 94] kernel: wlan0: set WME_AC_VI (bss ) [acm 0 aifsn > 2 logcwmin 3 logcwmax 4 txop 94] kernel: wlan0: set WME_AC_VO (chan) > [acm 0 aifsn 1 logcwmin 2 logcwmax 3 txop 47] kernel: wlan0: set > WME_AC_VO (bss ) [acm 0 aifsn 2 logcwmin 2 logcwmax 3 txop 47] > kernel: wlan0: ieee80211_wme_updateparams_locked: WME params > updated, cap_info 0x6 kernel: wlan0: ieee80211_new_state_locked: > SCAN -> RUN (nrunning 0 nscanning 0) kernel: wlan0: > ieee80211_newstate_cb: SCAN -> RUN arg -1 > kernel: run0: run_update_beacon_cb: updating beacon > kernel: wlan0: ieee80211_beacon_update: traffic 0, enable aggressive > mode kernel: wlan0: update WME_AC_BE (chan+bss) [acm 0 aifsn 2 > logcwmin 4 logcwmax 10 txop 0] kernel: wlan0: update WME_AC_BE > (chan+bss) logcwmin 3 > kernel: wlan0: ieee80211_wme_updateparams_locked: WME params updated, > cap_info 0x7 kernel: wlan0: hostap_newstate: SCAN -> RUN (-1) > kernel: wlan0: synchronized with 00:14:d1:a8:66:1d ssid "test" > channel 6 start 0Mb kernel: wlan0: [00:14:d1:a8:66:1d] > ieee80211_node_authorize: inact_reload 20 > > As you can see, run_update_beacon_cb() is invoked, but at this time > the beacon is already allocated. As the beacon is allocated, > run_update_beacon_cb() invokes ieee80211_beacon_update(). As we > know, the ieee80211_beacon_update() doesn't update the SSID, so the > SSID remains untouched. > Nevertheless the changing or hiding/unhiding a SSID seems to be > working. It is possible to explain: the station uses an active scan. > The ieee80211_send_proberesp()/ieee80211_alloc_proberesp() returns > the frame, containing an updated SSID, but AP continues to broadcast > beacon with the outdated data. > The possible solution is to deallocate a beacon on a state change. > I've decided to deallocate a beacon on 'to RUN' state transition. > The additional patch is attached. > I'll do an additional tests later today... Thank you. That's what I expected actually, when we are going through state changes (RUN -> ... -> RUN) net80211 expects us to throw most knowledge we have aways. This seems to be safest solution. When the beacon mbuf is completely thrown away and created from scratch we can be absolutely sure we handled all cases. -- Bernhard ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: if_run in hostap mode: issue with stations in the power save mode
On Tuesday, February 08, 2011 02:18:30 PseudoCylon wrote: > - Original Message > > > From: Bernhard Schmidt > > To: PseudoCylon > > Cc: Alexander Zagrebin ; freebsd-net@freebsd.org > > Sent: Sun, February 6, 2011 3:42:43 AM > > Subject: Re: if_run in hostap mode: issue with stations in the > > power save mode > > Afaik iwn(4) doesn't use PS, never got around implementing that. > > > > I'd like to move ieee80211_beacon_alloc() into iv_vap_alloc(). > > > Then we don't need to test beacon_mbuf == NULL in > > > run_update_beacon_cb(), and there is already switch we can use > > > for conditionally alloc mem. > > > > Sounds fine with we. > > Oops, there is switch before malloc vap. the test is still > in run_update_beacon_cb() > > > Can I talk you into integrating that into Alexander's patch? > > The patch is attached. (diff to HEAD) Bit long, just because there is > a couple of new call back functions to avoid LOR. Thank you! I've combined both patches (see attachment), if I get an ACK from both of you I'll try get this into the tree ASAP. -- Bernhard Index: sys/dev/usb/wlan/if_runvar.h === --- sys/dev/usb/wlan/if_runvar.h (revision 218367) +++ sys/dev/usb/wlan/if_runvar.h (working copy) @@ -121,6 +121,7 @@ struct run_cmdq { struct run_vap { struct ieee80211vap vap; struct ieee80211_beacon_offsets bo; + struct mbuf *beacon_mbuf; int (*newstate)(struct ieee80211vap *, enum ieee80211_state, int); Index: sys/dev/usb/wlan/if_run.c === --- sys/dev/usb/wlan/if_run.c (revision 218367) +++ sys/dev/usb/wlan/if_run.c (working copy) @@ -388,6 +388,7 @@ static void run_scan_end(struct ieee80211com *); static void run_update_beacon(struct ieee80211vap *, int); static void run_update_beacon_cb(void *); static void run_updateprot(struct ieee80211com *); +static void run_updateprot_cb(void *); static void run_usb_timeout_cb(void *); static void run_reset_livelock(struct run_softc *); static void run_enable_tsf_sync(struct run_softc *); @@ -398,6 +399,7 @@ static void run_set_leds(struct run_softc *, uint1 static void run_set_bssid(struct run_softc *, const uint8_t *); static void run_set_macaddr(struct run_softc *, const uint8_t *); static void run_updateslot(struct ifnet *); +static void run_updateslot_cb(void *); static void run_update_mcast(struct ifnet *); static int8_t run_rssi2dbm(struct run_softc *, uint8_t, uint8_t); static void run_update_promisc_locked(struct ifnet *); @@ -674,7 +676,7 @@ run_attach(device_t self) ic->ic_set_channel = run_set_channel; ic->ic_node_alloc = run_node_alloc; ic->ic_newassoc = run_newassoc; - //ic->ic_updateslot = run_updateslot; + ic->ic_updateslot = run_updateslot; ic->ic_update_mcast = run_update_mcast; ic->ic_wme.wme_update = run_wme_update; ic->ic_raw_xmit = run_raw_xmit; @@ -856,6 +858,9 @@ run_vap_delete(struct ieee80211vap *vap) RUN_LOCK(sc); + m_freem(rvp->beacon_mbuf); + rvp->beacon_mbuf = NULL; + rvp_id = rvp->rvp_id; sc->ratectl_run &= ~(1 << rvp_id); sc->rvp_bmap &= ~(1 << rvp_id); @@ -1790,6 +1795,9 @@ run_newstate(struct ieee80211vap *vap, enum ieee80 sc->runbmap |= bid; } + m_freem(rvp->beacon_mbuf); + rvp->beacon_mbuf = NULL; + switch (vap->iv_opmode) { case IEEE80211_M_HOSTAP: case IEEE80211_M_MBSS: @@ -3901,8 +3909,29 @@ run_update_beacon(struct ieee80211vap *vap, int it { struct ieee80211com *ic = vap->iv_ic; struct run_softc *sc = ic->ic_ifp->if_softc; + struct run_vap *rvp = RUN_VAP(vap); + int mcast = 0; uint32_t i; + KASSERT(vap != NULL, ("no beacon")); + + switch (item) { + case IEEE80211_BEACON_ERP: + run_updateslot(ic->ic_ifp); + break; + case IEEE80211_BEACON_HTINFO: + run_updateprot(ic); + break; + case IEEE80211_BEACON_TIM: + mcast = 1; /*TODO*/ + break; + default: + break; + } + + setbit(rvp->bo.bo_flags, item); + ieee80211_beacon_update(vap->iv_bss, &rvp->bo, rvp->beacon_mbuf, mcast); + i = RUN_CMDQ_GET(&sc->cmdq_store); DPRINTF("cmdq_store=%d\n", i); sc->cmdq[i].func = run_update_beacon_cb; @@ -3916,6 +3945,7 @@ static void run_update_beacon_cb(void *arg) { struct ieee80211vap *vap = arg; + struct run_vap *rvp = RUN_VAP(vap); struct ieee80211com *ic = vap->iv_ic; struct run_softc *sc = ic->ic_ifp->if_softc; struct rt2860_txwi txwi; @@ -3925,8 +3955,17 @@ run_update_beacon_cb(void *arg) if (vap->iv_bss->ni_chan == IEEE80211_CHAN_ANYC) return; - if ((m = ieee80211_beacon_alloc(vap->iv_bss, &RUN_VAP(vap)->bo)) == NULL) - return; + /* + * No need to call ieee80211_beacon_update(), run_update_beacon() + * is taking care of apropriate calls. + */ + if (rvp->beacon_mbuf == NULL) { + rvp->beacon_mbuf = ieee80211_beacon_alloc(vap->iv_bss, + &rvp->bo); + if (rvp->beacon_mbuf == NULL) + return; + } + m = rv
Re: igb driver RX (was TX) hangs when out of mbuf clusters
On Feb 8, 2011, at 10:10 AM, Lev Serebryakov wrote: > Hello, Karim. > You wrote 8 февраля 2011 г., 6:29:53: > >> Precisely, the exact same behavior happens (RX hang) if options >> DEVICE_POLLING is _not_ used in the kernel configuration file. I tried with >> POLLING since someone mentioned that it helped in a case mentioned earlier >> today. Unfortunately for igb with or without polling yields the same rx ring >> filing problem. > In my case (em(4), not igb(4) but symptoms are VERY similar) POLLING > (both as kernel option AND "ifconfig em0 polling") options leads to > resets (which drops all connections!) AFTER such kernel messages: > > em0: Watchdog timeout -- resetting > em0: Queue(0) tdh = 1302, hw tdt = 1265 > em0: TX(0) desc avail = 31,Next TX to Clean = 1296 Can you apply the attached patch and report what the output for rx_nxt_refresh and rx_nxt_check is? Best regards Michael patch Description: Binary data > > -- > // Black Lion AKA Lev Serebryakov > > ___ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" > ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: igb driver RX (was TX) hangs when out of mbuf clusters
On Feb 8, 2011, at 4:29 AM, Karim Fodil-Lemelin wrote: > 2011/2/7 Pyun YongHyeon > >> On Mon, Feb 07, 2011 at 09:21:45PM -0500, Karim Fodil-Lemelin wrote: >>> 2011/2/7 Pyun YongHyeon >>> On Mon, Feb 07, 2011 at 05:33:47PM -0500, Karim Fodil-Lemelin wrote: > Subject: Re: igb driver tx hangs when out of mbuf clusters > >> To: Lev Serebryakov >> Cc: freebsd-net@freebsd.org >> >> >> 2011/2/7 Lev Serebryakov >> >> Hello, Karim. >>> You wrote 7 февраля 2011 г., 19:58:04: >>> >>> The issue is with the igb driver from 7.4 RC3 r218406. If the >> driver >>> runs out of mbuf clusters it simply stops receiving even after the clusters >>> have been freed. >>> It looks like my problems with em0 (see thread "em0 hangs >> without >>> any messages like "Watchdog timeout", only down/up reset it.")... >>> Codebase for em and igb is somewhat common... >>> >>> -- >>> // Black Lion AKA Lev Serebryakov >>> >>> I agree. >> >> Do you get missed packets in mac_stats (sysctl dev.em | grep >> missed)? >> >> I might not have mentioned but I can also 'fix' the problem by >> doing >> ifconfig igb0 down/up. >> >> I will try using POLLING to 'automatize' the reset as you mentioned >> in your >> thread. >> >> Karim. >> >> > Follow up on tests with POLLING: The problem is still occurring >> although it > takes more time ... Outputs of sysctl dev.igb0 and netstat -m will follow: > > 9219/99426/108645 mbufs in use (current/cache/total) > 9217/90783/10/10 mbuf clusters in use >> (current/cache/total/max) Do you see network processes are stuck in keglim state? If you see that I think that's not trivial to solve. You wouldn't even kill that process if it is under keglim state unless some more mbuf clusters are freed from other places. >>> >>> No keglim state, here is a snapshot of top -SH while the problem is >>> happening: >>> >>> 12 root 171 ki31 0K 8K CPU5 5 19:27 100.00% idle: >>> cpu5 >>> 10 root 171 ki31 0K 8K CPU7 7 19:26 100.00% idle: >>> cpu7 >>> 14 root 171 ki31 0K 8K CPU3 3 19:25 100.00% idle: >>> cpu3 >>> 11 root 171 ki31 0K 8K CPU6 6 19:25 100.00% idle: >>> cpu6 >>> 13 root 171 ki31 0K 8K CPU4 4 19:24 100.00% idle: >>> cpu4 >>> 15 root 171 ki31 0K 8K CPU2 2 19:22 100.00% idle: >>> cpu2 >>> 16 root 171 ki31 0K 8K CPU1 1 19:18 100.00% idle: >>> cpu1 >>> 17 root 171 ki31 0K 8K RUN0 19:12 100.00% idle: >>> cpu0 >>> 18 root -32- 0K 8K WAIT 6 0:04 0.10% swi4: >>> clock s >>> 20 root -44- 0K 8K WAIT 4 0:08 0.00% swi1: >> net >>> 29 root -68- 0K 8K - 0 0:02 0.00% igb0 >> que >>> 35 root -68- 0K 8K - 2 0:02 0.00% em1 >> taskq >>> 28 root -68- 0K 8K WAIT 5 0:01 0.00% irq256: >>> igb0 >>> >>> keep in mind that num_queues has been forced to 1. >>> >>> I think both igb(4) and em(4) pass received frame to upper stack before allocating new RX buffer. If driver fails to allocate new RX buffer driver will try to refill RX buffers in next run. Under extreme resource shortage case, this situation can produce no more RX buffers in RX descriptor ring and this will take the box out of network. Other drivers avoid that situation by allocating new RX buffer before passing received frame to upper stack. If RX buffer allocation fails driver will just reuse old RX buffer without passing received frame to upper stack. That does not completely solve the keglim issue though. I think you should have enough mbuf cluters to avoid keglim. However the output above indicates you have enough free mbuf clusters. So I guess igb(4) encountered zero available RX buffer situation in past but failed to refill the RX buffer again. I guess driver may be able to periodically check available RX buffers. Jack may have better idea if this was the case.(CCed) >>> >>> That is exactly the pattern. The driver runs out of clusters but they >>> eventually get consumed and freed although the driver refuses to process >> any >>> new frames. It is, on the other hand, perfectly capable of sending out >>> packets. >>> >> >> Ok, this clearly indicates igb(4) failed to refill RX buffers since >> you can still send frames. I'm not sure whether igb(4) controllers >> could be configured to generate no RX buffer interrupts but that >> interrupt would be better suited to trigger RX refilling than timer >> based refilling. Since igb(4) keeps track of available RX buffers, >> igb(4) can selectively enable that i
Re: ipfw, ipv6 and gif(4)
Hi, > On Tue, 08 Feb 2011 14:05:38 +0500 > "Eugene M. Zheganin" said: emz> As you can see, P:41 is IPv6: emz> %grep 41 /etc/protocols emz> ipv641 IPV6# ipv6 emz> And, of course, ipfw doesn't allow me to create the rules it is emz> actually logging: emz> %ipfw add 7 allow 41 from 216.66.80.26 to 89.250.210.67 in via vlan104 emz> ipfw: bad address "216.66.80.26" emz> Do I misunderstand the concept, or is it how it really should look ? Something like `pass ip4 from any to any proto ipv6' should work for you. Sincerely, -- Hajimu UMEMOTO @ Internet Mutual Aid Society Yokohama, Japan u...@mahoroba.org ume@{,jp.}FreeBSD.org http://www.imasy.org/~ume/ ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
IPv6 Extension Headers
Hi, I'm looking for some guidance on implementing extension headers in the kernel for outgoing packets and processing incoming packets. Is anybody available to discuss it with me (on or off the mailing list) to help me get the ball rolling. Thanks___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
RE: divert rewrite
> -Original Message- > From: owner-freebsd-...@freebsd.org [mailto:owner-freebsd- > n...@freebsd.org] On Behalf Of Sergey Matveychuk > Sent: Monday, February 07, 2011 11:37 PM > To: Julian Elischer > Cc: Ivo Vachkov; FreeBSD Net > Subject: Re: divert rewrite > > 06.02.2011 4:42, Julian Elischer wrote: > > On 2/5/11 4:09 PM, Ivo Vachkov wrote: > >> Hello, > >> > >> How can I help? > > > > if you have ipv6 connectivity and experience, I have no experience or > > connectivity, with it so > > I'll be coding blind and will need a tester. > > If you have an application for IPV6 testing that would be even > better. > > Divert is often used for NAT but that doesn't seem very useful for > IPv6 and > > natd doesn't support it anyhow. > > Object :) > Divert is really useful way to get packets from firewall to userspace, > analyse or process them some way and put them back. Really I see no > other way for this for IPv6. I've tried ng_socket+ng_nat but there is > no > easy way to put a packet back in firewall. > > I'm very interested in the process. And I'm ready to help in testing. Did you try ng_ether + ng_ksocket? It can translate Ethernet frames incapsulated to udp to user space receiver. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: igb driver RX (was TX) hangs when out of mbuf clusters
> 2011/2/8 Michael Tüxen > >> On Feb 8, 2011, at 4:29 AM, Karim Fodil-Lemelin wrote: >> >> > 2011/2/7 Pyun YongHyeon >> > >> >> On Mon, Feb 07, 2011 at 09:21:45PM -0500, Karim Fodil-Lemelin wrote: >> >>> 2011/2/7 Pyun YongHyeon >> >>> >> On Mon, Feb 07, 2011 at 05:33:47PM -0500, Karim Fodil-Lemelin wrote: >> > Subject: Re: igb driver tx hangs when out of mbuf clusters >> > >> >> To: Lev Serebryakov >> >> Cc: freebsd-net@freebsd.org >> >> >> >> >> >> 2011/2/7 Lev Serebryakov >> >> >> >> Hello, Karim. >> >>> You wrote 7 февраля 2011 г., 19:58:04: >> >>> >> >>> >> The issue is with the igb driver from 7.4 RC3 r218406. If the >> >> driver >> >>> runs >> out of mbuf clusters it simply stops receiving even after the >> clusters >> >>> have >> been freed. >> >>> It looks like my problems with em0 (see thread "em0 hangs >> >> without >> >>> any messages like "Watchdog timeout", only down/up reset it.")... >> >>> Codebase for em and igb is somewhat common... >> >>> >> >>> -- >> >>> // Black Lion AKA Lev Serebryakov >> >>> >> >>> I agree. >> >> >> >> Do you get missed packets in mac_stats (sysctl dev.em | grep >> >> missed)? >> >> >> >> I might not have mentioned but I can also 'fix' the problem by >> >> doing >> >> ifconfig igb0 down/up. >> >> >> >> I will try using POLLING to 'automatize' the reset as you mentioned >> >> in >> your >> >> thread. >> >> >> >> Karim. >> >> >> >> >> > Follow up on tests with POLLING: The problem is still occurring >> >> although >> it >> > takes more time ... Outputs of sysctl dev.igb0 and netstat -m will >> follow: >> > >> > 9219/99426/108645 mbufs in use (current/cache/total) >> > 9217/90783/10/10 mbuf clusters in use >> >> (current/cache/total/max) >> >> Do you see network processes are stuck in keglim state? If you see >> that I think that's not trivial to solve. You wouldn't even kill >> that process if it is under keglim state unless some more mbuf >> clusters are freed from other places. >> >> >>> >> >>> No keglim state, here is a snapshot of top -SH while the problem is >> >>> happening: >> >>> >> >>> 12 root 171 ki31 0K 8K CPU5 5 19:27 100.00% >> idle: >> >>> cpu5 >> >>> 10 root 171 ki31 0K 8K CPU7 7 19:26 100.00% >> idle: >> >>> cpu7 >> >>> 14 root 171 ki31 0K 8K CPU3 3 19:25 100.00% >> idle: >> >>> cpu3 >> >>> 11 root 171 ki31 0K 8K CPU6 6 19:25 100.00% >> idle: >> >>> cpu6 >> >>> 13 root 171 ki31 0K 8K CPU4 4 19:24 100.00% >> idle: >> >>> cpu4 >> >>> 15 root 171 ki31 0K 8K CPU2 2 19:22 100.00% >> idle: >> >>> cpu2 >> >>> 16 root 171 ki31 0K 8K CPU1 1 19:18 100.00% >> idle: >> >>> cpu1 >> >>> 17 root 171 ki31 0K 8K RUN0 19:12 100.00% >> idle: >> >>> cpu0 >> >>> 18 root -32- 0K 8K WAIT 6 0:04 0.10% swi4: >> >>> clock s >> >>> 20 root -44- 0K 8K WAIT 4 0:08 0.00% swi1: >> >> net >> >>> 29 root -68- 0K 8K - 0 0:02 0.00% igb0 >> >> que >> >>> 35 root -68- 0K 8K - 2 0:02 0.00% em1 >> >> taskq >> >>> 28 root -68- 0K 8K WAIT 5 0:01 0.00% >> irq256: >> >>> igb0 >> >>> >> >>> keep in mind that num_queues has been forced to 1. >> >>> >> >>> >> >> I think both igb(4) and em(4) pass received frame to upper stack >> before allocating new RX buffer. If driver fails to allocate new RX >> buffer driver will try to refill RX buffers in next run. Under >> extreme resource shortage case, this situation can produce no more >> RX buffers in RX descriptor ring and this will take the box out of >> network. Other drivers avoid that situation by allocating new RX >> buffer before passing received frame to upper stack. If RX buffer >> allocation fails driver will just reuse old RX buffer without >> passing received frame to upper stack. That does not completely >> solve the keglim issue though. I think you should have enough mbuf >> cluters to avoid keglim. >> >> However the output above indicates you have enough free mbuf >> clusters. So I guess igb(4) encountered zero available RX buffer >> situation in past but failed to refill the RX buffer again. I guess >> driver may be able to periodically check available RX buffers. >> Jack may have better idea if this was the case.(CCed) >> >> >>> >> >>> That is exactly the pattern. The driver runs out of clusters but they >> >>> eventually get consumed and freed although the driver refuses to >> process >> >> any >> >>> new frames. It is, on the other hand, perfectly capable of sending out >> >>>
Re: divert rewrite
07.02.2011 18:36, Sergey Matveychuk wrote: 06.02.2011 4:42, Julian Elischer wrote: On 2/5/11 4:09 PM, Ivo Vachkov wrote: Hello, How can I help? if you have ipv6 connectivity and experience, I have no experience or connectivity, with it so I'll be coding blind and will need a tester. If you have an application for IPV6 testing that would be even better. Divert is often used for NAT but that doesn't seem very useful for IPv6 and natd doesn't support it anyhow. Object :) Divert is really useful way to get packets from firewall to userspace, analyse or process them some way and put them back. Really I see no other way for this for IPv6. I've tried ng_socket+ng_nat but there is no easy way to put a packet back in firewall. Oops, I meant ng_socket+ng_ipfw here. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: divert rewrite
08.02.2011 19:08, rozhuk...@gmail.com wrote: Did you try ng_ether + ng_ksocket? It can translate Ethernet frames incapsulated to udp to user space receiver. The idea is catch packets from firewall (ng_ipfw, ng_nat was mentioned by mistake) and pass them to user space module that do some processing and puts back the packets into firewall (for rules with `diverted' keyword). It works now for IPv4 with `divert' and doesn't with IPv6. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
TCP can advertise a really huge window
This is a very bizarre edge case, so bear with me. I was debugging an edge case at work recently that occurred when the socket buffer was filled up exactly (i.e. sbspace(&so->so_rcv) == 0). In TCP terms, this would be when rcv_nxt == rcv_adv. To simulate the real workload I had a very fast writer blasting over lo0 to a slow reader but had used a small buffer and turned off window scaling (as I had to fill the entire socket buffer, I was chasing an off-by-1 bug). However, I ended up with some bizarre behavior. I think it is less confusing to describe the sequence of events that I now know happened than how I figured this out, so here goes. - Assume we have advertised a window size of N which corresponds exactly to sbspace(&so->so_rcv). - The remote peer sends a packet of length N filling our window. We respond with a zero-window ACK. This advances rcv_nxt to == rcv_adv, but it does not grow rcv_adv because sbspace() is currently 0. - The userland app very slowly drains data from the socket buffer. However, the calls to tcp_usr_recvd() do not trigger a window update because in this case the link is over lo0 which has a relatively large t_maxseg (about 14k) and this condition in tcp_output() is not met: if (adv >= (long) (2 * tp->t_maxseg)) goto send; if (2 * adv >= (long) so->so_rcv.sb_hiwat) goto send; - A timer at the remote peer expires and it sends a window probe with one byte of data. Since userland has read some data (just not 2 * MSS), we accept this packet. However, receiving this packet moves rcv_nxt += 1, so rcv_nxt is now > rcv_adv. - We call tcp_output() to ACK the window probe and as part of this calculate the receive window to advertise here: if (recwin < (long)(so->so_rcv.sb_hiwat / 4) && recwin < (long)tp->t_maxseg) recwin = 0; if (recwin < (long)(tp->rcv_adv - tp->rcv_nxt)) recwin = (long)(tp->rcv_adv - tp->rcv_nxt); if (recwin > (long)TCP_MAXWIN << tp->rcv_scale) recwin = (long)TCP_MAXWIN << tp->rcv_scale; The "surprise" kicks in on the second conditional. The problem is that rcv_adv - rcv_nxt is now equal to (uint32_t)-1. On a 32-bit machine the cast to (long) effectively just makes this value signed and thus -1. On a 64-bit machine you actually end up with a ginormous value of 2^32 - 1, or a 4GB window (minus a byte). The third conditional truncates that to the maximum window we can advertise, but this value may be larger than the actual space in the socket buffer. The remote peer now has a huge window to throw data into. At work this proved disastrous. I'm not sure if there are any practical concerns. This is the patch I'm using as a fix: Index: tcp_output.c === --- tcp_output.c(revision 215582) +++ tcp_output.c(working copy) @@ -928,7 +928,8 @@ if (recwin < (long)(so->so_rcv.sb_hiwat / 4) && recwin < (long)tp->t_maxseg) recwin = 0; - if (recwin < (long)(tp->rcv_adv - tp->rcv_nxt)) + if (SEQ_GT(tp->rcv_adv, tp->rcv_nxt) && + recwin < (long)(tp->rcv_adv - tp->rcv_nxt)) recwin = (long)(tp->rcv_adv - tp->rcv_nxt); if (recwin > (long)TCP_MAXWIN << tp->rcv_scale) recwin = (long)TCP_MAXWIN << tp->rcv_scale; -- John Baldwin ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
A small TCP bug: excessive duplicate ACKs
One thing I've noticed at work is that if a receiver's socket buffer fills and the receiver then drains the buffer all at once, we send a lot of duplicate ACKs. I narrowed this down to being due to the abnormally high window scaling factor we have. We set kern.ipc.maxsockbuf to 314572800 which results in a window scaling factor of 8k. This interacts poorly with the logic that decides whether or not to force a window update in tcp_output(): /* * Compare available window to amount of window * known to peer (as advertised window less * next expected input). If the difference is at least two * max size segments, or at least 50% of the maximum possible * window, then want to send a window update to peer. * Skip this if the connection is in T/TCP half-open state. * Don't send pure window updates when the peer has closed * the connection and won't ever send more data. */ if (recwin > 0 && !(tp->t_flags & TF_NEEDSYN) && !TCPS_HAVERCVDFIN(tp->t_state)) { /* * "adv" is the amount we can increase the window, * taking into account that we are limited by * TCP_MAXWIN << tp->rcv_scale. */ long adv = min(recwin, (long)TCP_MAXWIN << tp->rcv_scale) - (tp->rcv_adv - tp->rcv_nxt); if (adv >= (long) (2 * tp->t_maxseg)) goto send; if (2 * adv >= (long) so->so_rcv.sb_hiwat) goto send; } Specifically, we can send a duplicate ACK when (2 * tp->t_maxseg) or (so->so_rcv.sb_hiwat / 2) are less than the window scaling factor. I have a test app that you can run against a TCP chargen service from inetd to reproduce it. I also have two TCP dumps from before and after. The patch I'm using to fix this is below (I could rework it to not use the extra goto perhaps, but went with a simple hack to minimize reindenting for now): Index: tcp_output.c === --- tcp_output.c(revision 217650) +++ tcp_output.c(working copy) @@ -560,11 +560,19 @@ long adv = min(recwin, (long)TCP_MAXWIN << tp->rcv_scale) - (tp->rcv_adv - tp->rcv_nxt); + /* +* If the new window size ends up being the same as the old +* size when it is scaled, then don't force a window update. +*/ + if ((tp->rcv_adv - tp->rcv_nxt) >> tp->rcv_scale == + (adv + tp->rcv_adv - tp->rcv_nxt) >> tp->rcv_scale) + goto dontupdate; if (adv >= (long) (2 * tp->t_maxseg)) goto send; if (2 * adv >= (long) so->so_rcv.sb_hiwat) goto send; } +dontupdate: /* * Send if we owe the peer an ACK, RST, SYN, or urgent data. ACKNOW Note that if the ACK sequence number has moved then I think other checks in tcp_output() will still force an ACK packet out, so I don't think this will cause us to miss on sending ACKs to the peers. You can find the test app source (tcpslow.c) and the dumps at http://people.freebsd.org/~jhb/tcpslow/ If you look at tcp_bad.out, the receiver stops reading data the receiver's socket buffer fills up around packet 72 or so. The receiver wakes up at packet 88 and drains the buffer causing a small storm of window updates. However, due to the scaling factor, it actually sends duplicate ACKs in batches of threes (3 ACKs for 8k window, 3 ACKs for 16k window, etc.). This happens each time the receiver wakes up and drains a full socket buffer. The tcp_good.out dump shows the stream with the patch applied. A similar event of the receiver draining a full buffer starts at packet 83 and it sends a single ACK for each "real" window update. -- John Baldwin ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: divert rewrite
08.02.2011 19:08, rozhuk...@gmail.com wrote: Did you try ng_ether + ng_ksocket? It can translate Ethernet frames incapsulated to udp to user space receiver. The idea is catch packets from firewall (ng_ipfw, ng_nat was mentioned by mistake) and pass them to user space module that do some processing and puts back the packets into firewall (for rules with `diverted' keyword). yes, however did you try the ipfw netgraph keyword and the ng_ipfw node? I have also been wondering it it might not make sense to simpply replavce the diver code with a netgraph equivalent.. Using the ng_ipfw node one can almost do it with no changes as it is. It works now for IPv4 with `divert' and doesn't with IPv6. yes, I'm pondering the right fix for that.. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
TCP connections stuck in persist state
I ran into a problem recently where a TCP socket seemed to never exit persist mode. What would happen is that the sender was blasting data faster than the receiver could receive it. When the receiver read some data, the sender would start sending again and everything would resume. However, after 3-4 instances of this, the sender would decide to not resume sending data when the receiver opened the window. Instead, it would slowly send a byte every few seconds via the persist timer even though the receiver was advertising a 64k window when it ACKd each of the window probes that was sent by the sender. I dug around in kgdb and found that both snd_cwnd and snd_ssthresh were set to 0 on the sender side. I think this means that the send window is effectively permamently stuck at zero as a result of this. (The tcpcb is also IN_FASTRECOVERY() on the sender side, probably from the storm of duplicate acks from the receiver when it sends a bunch of window updates (see my earlier e-mail to net@ for the source of duplicate ACKs.) Anyway, I think that this code in tcp_input() is what keeps the window at zero: /* * If the congestion window was inflated to account * for the other side's cached packets, retract it. */ if (tcp_do_newreno || (tp->t_flags & TF_SACK_PERMIT)) { if (IN_FASTRECOVERY(tp)) { if (SEQ_LT(th->th_ack, tp->snd_recover)) { if (tp->t_flags & TF_SACK_PERMIT) tcp_sack_partialack(tp, th); else tcp_newreno_partial_ack(tp, th); } else { /* * Out of fast recovery. * Window inflation should have left us * with approximately snd_ssthresh * outstanding data. * But in case we would be inclined to * send a burst, better to do it via * the slow start mechanism. */ KASSERT(tp->snd_ssthresh != 0, ("using bogus snd_ssthresh")); if (SEQ_GT(th->th_ack + tp->snd_ssthresh, tp->snd_max)) tp->snd_cwnd = tp->snd_max - th->th_ack + tp->t_maxseg; else tp->snd_cwnd = tp->snd_ssthresh; } } Specifically, since snd_recover and snd_una seem to keep advancing in lock-step with each window update, I think it ends up falling down to the last statement each time where snd_cwnd = snd_ssthresh thus keeping snd_cwnd at 0. This then causes a zero send window in tcp_output(): sendwin = min(tp->snd_wnd, tp->snd_cwnd); sendwin = min(sendwin, tp->snd_bwnd); Now, looking at the code, I can see no way that snd_ssthresh should ever be zero. It seems to always be calculated from some number of segments times t_maxseg. The one exception to this rule is when it is restored from snd_ssthresh_prev due to a bad retransmit (this example is from tcp_input()): /* * If we just performed our first retransmit, and the ACK * arrives within our recovery window, then it was a mistake * to do the retransmit in the first place. Recover our * original cwnd and ssthresh, and proceed to transmit where * we left off. */ if (tp->t_rxtshift == 1 && (int)(ticks - tp->t_badrxtwin) < 0) { ++tcpstat.tcps_sndrexmitbad; tp->snd_cwnd = tp->snd_cwnd_prev; tp->snd_ssthresh = tp->snd_ssthresh_prev; tp->snd_recover = tp->snd_recover_prev; if (tp->t_flags & TF_WASFRECOVERY) ENTER_FASTRECOVERY(tp); tp->snd_nxt = tp->snd_max; tp->t_badrxtwin = 0;/* XXX probably not required */ } So then my working theory is that somehow, snd_ssthresh_prev is being used when it hasn't been initialized. I then checked 'ticks' on my host and found that it had wrapped
Re: divert rewrite
08.02.2011 20:03, Julian Elischer wrote: 08.02.2011 19:08, rozhuk...@gmail.com wrote: Did you try ng_ether + ng_ksocket? It can translate Ethernet frames incapsulated to udp to user space receiver. The idea is catch packets from firewall (ng_ipfw, ng_nat was mentioned by mistake) and pass them to user space module that do some processing and puts back the packets into firewall (for rules with `diverted' keyword). yes, however did you try the ipfw netgraph keyword and the ng_ipfw node? I have also been wondering it it might not make sense to simpply replavce the diver code with a netgraph equivalent.. Using the ng_ipfw node one can almost do it with no changes as it is. I've tried ng_socket+ng_ipfw. It gets incoming packets, but outgoing packets drops because of a tag having lost after leaving kernel space. It looks like a magic can be done with ng_tag node, but really I could not tame it. It works now for IPv4 with `divert' and doesn't with IPv6. yes, I'm pondering the right fix for that.. I'm first to test it please :) ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Proposed patch for Port Randomization modifications according to RFC6056
I've been up and running on this patch vs. r218391 for over 24 hours now, using algorithm 4 (as someone said is now the default in Linux) without any problems. I think Bjoern is better qualified than I to comment on the style of the patch, but it applies cleanly, and seems to run fine on both v4 and v6. hth, Doug On 01/31/2011 04:52, Ivo Vachkov wrote: Hello, I attach the latest version of the port randomization code as a patch against RELENG_8. Changelog: 1) sysctl variable names are changed to: - 'net.inet.ip.portrange.randomalg.version' - representing the algorithm of choice. - 'net.inet.ip.portrange.randomalg.alg5_tradeoff' - representing the Algorithm 5 computational tradeoff value (the 'N' value in the Algorithm 5 description in the RFC 6056). 2) Code comments are synchronized with the current variable names. Ivo Vachkov On Sat, Jan 29, 2011 at 4:27 AM, Doug Barton wrote: On 01/28/2011 11:57, Ivo Vachkov wrote: On Fri, Jan 28, 2011 at 9:00 PM, Doug Bartonwrote: How does net.inet.ip.portrange.randomalg sound? I would also suggest that the second sysctl be named net.inet.ip.portrange.randomalg.alg5_tradeoff so that one could do 'sysctl net.inet.ip.portrange.randomalg' and see both values. But I won't quibble on that. :) I have no objections with this. Since this is my first attempt to contribute something back to the community I decided to see how it's done before. So I found: net.inet.tcp.rfc1323 net.inet.tcp.rfc3465 net.inet.tcp.rfc3390 net.inet.tcp.rfc3042 which probably led me in a wrong direction :) Yeah, I had actually intended to say something to the effect of "there are plenty of unfortunate examples in the tree already so your doing it that way is totally understandable" but I trimmed it. I understand your point and agree with it. However, my somewhat limited understanding of the sysctl internal organization is telling me that tree node does not support values. Am I wrong? You are likely correct. :) It's an inconvenient fact that often forget because that's not the sandbox that I usually play in. If my reasoning is correct, maybe I can create the sysctl variables with the following names: - net.inet.ip.portrange.randomalg (Tree Node) - net.inet.ip.portrange.randomalg.alg[orithm] (Leaf Node, to store the selected algorithm) I would go with "version" to increase the visual distinctiveness. I searched the current tree and there doesn't seem to be a clear winner for how to portray "this is the current N/M that is in use" but "version" seems to have the most representatives. - net.inet.ip.portrange.randomalg.alg5_tradeoff (Leaf Node, to store the Algorithm 5 trade-off value) I'm assuming this is the "N" value mentioned in the RFC. If so, I commend you on your choice of "tradeoff" to represent it. :) ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
RE: divert rewrite
> -Original Message- > From: Sergey Matveychuk [mailto:s...@freebsd.org] > Sent: Wednesday, February 09, 2011 12:53 AM > To: rozhuk...@gmail.com > Cc: freebsd-net@freebsd.org > Subject: Re: divert rewrite > > 08.02.2011 19:08, rozhuk...@gmail.com wrote: > > Did you try ng_ether + ng_ksocket? > > It can translate Ethernet frames incapsulated to udp to user space > receiver. > > The idea is catch packets from firewall (ng_ipfw, ng_nat was mentioned > by mistake) and pass them to user space module that do some processing > and puts back the packets into firewall (for rules with `diverted' > keyword). > > It works now for IPv4 with `divert' and doesn't with IPv6. I know how divert works, google: uTPControl ;) Its simple for developmet, stable, but uses many CPU. With ng_ether + ng_ksocket you can send custom Ethernet frames. There is some node that can filter traffic, for IPv6 you need allow 1 or 2 ethernet types to pass. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: divert rewrite
08.02.2011 21:47, rozhuk...@gmail.com пишет: -Original Message- From: Sergey Matveychuk [mailto:s...@freebsd.org] Sent: Wednesday, February 09, 2011 12:53 AM To: rozhuk...@gmail.com Cc: freebsd-net@freebsd.org Subject: Re: divert rewrite 08.02.2011 19:08, rozhuk...@gmail.com wrote: Did you try ng_ether + ng_ksocket? It can translate Ethernet frames incapsulated to udp to user space receiver. The idea is catch packets from firewall (ng_ipfw, ng_nat was mentioned by mistake) and pass them to user space module that do some processing and puts back the packets into firewall (for rules with `diverted' keyword). It works now for IPv4 with `divert' and doesn't with IPv6. I know how divert works, google: uTPControl ;) Its simple for developmet, stable, but uses many CPU. With ng_ether + ng_ksocket you can send custom Ethernet frames. There is some node that can filter traffic, for IPv6 you need allow 1 or 2 ethernet types to pass. I know. But I've written a module for conjunction with ipfw. It makes a decision by some criteria to pass a traffic or to block it. Administrators in our nets decide what kind traffic to pass to my module (mostly TCP SYN and few UDP) in their firewalls. So a conjection with ipfw is the goal. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: bogus 0 len IP packet, was: Hang in VOP_LOCK1_APV on 8-STABLE with NFS.
On Mon, 07 Feb 2011 01:22:36 +0100, Pyun YongHyeon wrote: On Sun, Feb 06, 2011 at 11:54:49PM +0100, Ronald Klop wrote: On Sat, 22 Jan 2011 00:01:47 +0100, Ronald Klop wrote: >On Tue, 18 Jan 2011 09:38:04 +0100, wrote: > So, does anyone have an idea why the IP length field would be set to >>>0 for these TCP/IP packets? Here's some info from Ronald w.r.t. his hardware. (All I can think >>>of is that he could try disabling TSO, etc?) Thanks in advance for any help with this, rick >>> >>>It seems that issue came from TSO. Driver will set ip_len and >>>ip_sum field to 0 before passing the TCP segment to controller. >>>The failed length were 4446, 5858, 3034 and 4310 and the total >>>number of such frames are more than 35k within 90 seconds. Since >>>failed length 4310 is continuously repeated I guess there is edge >>>case where em(4) didn't free failed TCP segment for TSO. >>>I remember there was commit to HEAD(r217295) which could be related >>>with this issue. >> >>I'm seeing the same problem with Broadcom NetXtreme (bce) cards: >> >>bce0@pci0:3:0:0:class=0x02 card=0x03421014 chip=0x164c14e4 >>rev=0x12 hdr=0x00 >>vendor = 'Broadcom Corporation' >>device = 'Broadcom NetXtreme II Gigabit Ethernet Adapter >>(BCM5708)' >>class = network >>subclass = ethernet >> >>This is with 8.2-PRERELEASE. Turning off TSO (ifconfig bce0 -tso) >>removes the problem. >> >>Steinar Haug, Nethelp consulting, sth...@nethelp.no >>___ >>freebsd-net@freebsd.org mailing list >>http://lists.freebsd.org/mailman/listinfo/freebsd-net >>To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" > >I tried -tso and -txcsum in various combinations, but it didn't solve >the problem. I wil look for another brand of network card to try. But >this has to wait till monday when I'm at the office again. I also used another network card (rl0) and it has the same problem with NFS. I'm going to change some network cables to see if that helps. I have some hints that there might be something wrong with that. Hmm, given that rl(4) also shows the issue it seems the issue could be in TCP/IP stack, not in driver side. rl(4) is dumb device so network stack should do segmentation and checksum computation. I highly doubt the issue came from faulty cable since other users also reported the same issue. Unfortunately I have no clue yet and I was not able to reproduce it on my box. I vaguely guess some code in kernel changed the ip_len to 0 in the middle of transmission. Rick's captured traffic looks normal except 0 ip_len given that controller is computing checksum on the fly. If mbuf chain was corrupted(e.g. m_len == 0) driver would have failed to send those frames. Changing the cable didn't help indeed. I'm glad the issue is seen by others too. I will try to downgrade to an older version of FreeBSD to try to find the commit which broke it. But that can take a while, because it is time consuming and I have to do some real work also at work. :-) Thanks for taking the time for it and I hope we will find the cause someday, Ronald. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
jumbo frames + geom_mirror = no net
Hi! I have 8.2 + latest updates, em + gigabit net, few HDDs in mirror. Samba for share HDDs to win hosts. (E5300, G33 + ICH9R, 2GB, PCI-E intel desktop GB adapter) ifconfig_em0="inet 172.16.0.254 netmask 255.255.255.0 mtu 9000" Then I start copy files to mirror (trough net or using cp from others HDD on host) after some time host stop respond to net, no errors messages in logs. (top show free mem < 10mb just before it happen) (It started after first mirror was created) And don’t respond until reboot or: ifconfig em0 mtu 1500 (from console) Vmstat show my failures for mbuf_jumbo_9k: mbuf_jumbo_9k: 9216, 6400,0,0, 36202768, 8472897 but no failures for simple 1,5k mbuf's :/ vmstat -z ITEM SIZE LIMIT USED FREE REQUESTS FAILURES UMA Kegs: 128,0, 97, 23, 97, 0 UMA Zones:888,0, 97,3, 97, 0 UMA Slabs:284,0, 1068, 514,90680, 0 UMA RCntSlabs:544,0, 2649, 529, 13469257, 0 UMA Hash: 128,0,1, 29,4, 0 16 Bucket: 76,0, 12, 88, 122, 0 32 Bucket:140,0, 16, 96, 308, 0 64 Bucket:268,0, 43, 69, 561, 13 128 Bucket: 524,0, 323,6, 599692, 5140 VM OBJECT:136,0,19972,36317, 1260502, 0 MAP: 136,0,7, 22,7, 0 KMAP ENTRY:72,57505, 2188, 2423, 4613615, 0 MAP ENTRY: 72,0, 2418, 762, 4043456, 0 DP fakepg: 72,0,0,0,0, 0 SG fakepg: 72,0,0,0,0, 0 mt_zone: 2056,0, 175, 237, 175, 0 16:16,0, 4239, 836, 16310509, 0 32:32,0, 2514, 1780, 124403442, 0 64:64,0, 5044, 3334, 402804800, 0 128: 128,0, 792, 1428, 7002330, 0 256: 256,0, 592, 593, 1093975, 0 512: 512,0, 165, 1035, 671161, 0 1024:1024,0, 55, 181, 882578, 0 2048:2048,0, 150, 250,73957, 0 4096:4096,0, 132, 243, 734964, 0 Files: 56,0, 5737, 628, 8850040, 0 TURNSTILE: 72,0, 416, 64, 451, 0 umtx pi: 52,0,0,0,0, 0 PROC: 680,0, 81, 171,38814, 0 THREAD: 720,0, 277, 138,11326, 0 SLEEPQUEUE:44,0, 416, 115, 451, 0 VMSPACE: 228,0, 57, 164,31952, 0 cpuset:40,0,2, 182,2, 0 mbuf_packet: 256,0, 4100, 609, 76165127, 0 mbuf: 256,0, 175, 1206, 398211372, 0 mbuf_cluster:2048,65536, 4845, 453, 30848491, 0 mbuf_jumbo_page: 4096,12800,0,0,0, 0 mbuf_jumbo_9k: 9216, 6400,0,0, 36202768, 8472897 mbuf_jumbo_16k: 16384, 3200,0,0,0, 0 mbuf_ext_refcnt:4,0,4, 402, 25681942, 0 ttyoutq: 256,0, 64, 41, 136, 0 g_bio:140,0,0, 1568, 205564803, 0 ttyinq: 152,0, 120, 62, 255, 0 ata_request: 208,0,0, 304, 67886293, 0 ata_composite:180,0,0,0,0, 0 VNODE:268,0,19632,27674, 5052284, 0 VNODEPOLL: 60,0, 16, 173, 21, 0 S VFS Cache: 72,0,12259,51977, 1280216, 0 L VFS Cache: 292,0, 2976, 729,70042, 0 NAMEI: 1024,0,0, 260, 153323563, 0 DIRHASH: 1024,0, 57, 343,57340, 0 AIO: 120,0,3, 93, 19, 0 AIOP: 16,0,4, 402, 44, 0 AIOCB:292,0,0, 390, 6897050, 0 AIOL: 64,0,0,0,0, 0 AIOLIO: 168,0,0,0,0, 0 pipe: 392,0, 50, 170,25245, 0 ksiginfo:
Re: igb driver RX (was TX) hangs when out of mbuf clusters
I have been following this, and thinking about it. I still am working from a theoretical standpoint, but based on a patch I got quite a long time back and never quite groked, I believe now that I might have a solution. The original PR and patch was kern/150516 from Beezar Liu, I was never quite comfortable with the code changes, nor convinced that it was a real issue and not a misunderstanding. However I think now that this very report might be behind what we are seeing today. I have a slightly different approach to solving it, of course it remains to be seen if it handles it properly. Please try the patch I've attached, I'm open to further correction or polishing of the changes. And thanks to Beezar for his original report and changes, this is not for em, but if this eliminates the problem its clearly needed in all drivers. Jack ProxyChains-3.1 (http://proxychains.sf.net) Index: if_igb.c === --- if_igb.c (revision 218463) +++ if_igb.c (working copy) @@ -4312,6 +4312,7 @@ struct mbuf *sendmp, *mh, *mp; struct igb_rx_buf *rxbuf; u16 hlen, plen, hdr, vtag; + int commit; bool eop = FALSE; cur = &rxr->rx_base[i]; @@ -4440,10 +4441,22 @@ bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map, BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE); + commit = i; /* capture the old index */ + /* Advance our pointers to the next descriptor. */ if (++i == adapter->num_rx_desc) i = 0; /* + ** Sanity test for ring full, if this + ** happens we need to refresh immediately + ** or refresh may deadlock. + */ + if (i == rxr->next_to_refresh) { + igb_refresh_mbufs(rxr, commit); + processed = 0; + } + + /* ** Send to the stack or LRO */ if (sendmp != NULL) { ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: igb driver RX (was TX) hangs when out of mbuf clusters
2011/2/8 Jack Vogel > > I have been following this, and thinking about it. I still am working from > a theoretical > standpoint, but based on a patch I got quite a long time back and never > quite groked, > I believe now that I might have a solution. > > The original PR and patch was kern/150516 from Beezar Liu, I was never > quite comfortable > with the code changes, nor convinced that it was a real issue and not a > misunderstanding. > However I think now that this very report might be behind what we are > seeing today. I have > a slightly different approach to solving it, of course it remains to be > seen if it handles it > properly. > > Please try the patch I've attached, I'm open to further correction or > polishing of the > changes. And thanks to Beezar for his original report and changes, this is > not for em, > but if this eliminates the problem its clearly needed in all drivers. > > Jack > > > Hi Jack, Thanks for your help. I tried your patch and it didn't work so I added a couple of printf to see if the added code was getting hit: --- a/freebsd/sys/dev/e1000/if_igb.c --More--(byte 1253)+++ b/freebsd/sys/dev/e1000/if_igb.c @@ -612,7 +612,7 @@ igb_attach(device_t dev) device_get_nameunit(dev)); INIT_DEBUGOUT("igb_attach: end"); - + printf("this driver has a patch from Jack Vogel\n"); return (0); err_late: @@ -4131,6 +4131,7 @@ igb_rxeof(struct igb_queue *que, int count, int *done) struct mbuf *sendmp, *mh, *mp; struct igb_rx_buf *rxbuf; u16 hlen, plen, hdr, vtag; + int commit; booleop = FALSE; cur = &rxr->rx_base[i]; @@ -4255,10 +4256,23 @@ next_desc: bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map, BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE); + commit = i; /* capture the old index */ + /* Advance our pointers to the next descriptor. */ if (++i == adapter->num_rx_desc) i = 0; /* + ** Sanity test for ring full, if this + ** happens we need to refresh immediately + ** or refresh may deadlock. + */ + if (i == rxr->next_to_refresh) { + igb_refresh_mbufs(rxr, commit); + printf("igb_refresh_mbufs called with commit %d\n", commit); + processed = 0; + } + + /* ** Send to the stack or LRO */ if (sendmp != NULL) { Here is the results: # dmesg | grep Vogel this driver has a patch from Jack Vogel this driver has a patch from Jack Vogel # netstat -m 60453/52707/113160 mbufs in use (current/cache/total) 48416/51584/10/10 mbuf clusters in use (current/cache/total/max) 2894/690 mbuf+clusters out of packet secondary zone in use (current/cache) 11946/854/12800/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 164834K/119760K/284595K bytes allocated to network (current/cache/total) 0/339/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/4/6656 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines # dmesg | grep commit At this point RX has hung. Somehow the check (i == rxr->next_to_refresh) is never true in this case. Also, I did read kern/150516 and couldn't wrap my head around the patch for the em driver that Beezar Liu suggested. Regards, Karim. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: if_run in hostap mode: issue with stations in the power save mode
Hi! On 08.02.2011 10:52:53 +0100, Bernhard Schmidt wrote: > I've combined both patches (see attachment), if I get an ACK from both > of you I'll try get this into the tree ASAP. The resulted patch works fine for me. Big thanks for your help! Waiting for the 802.11n support... :) -- Alexander Zagrebin ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: if_run in hostap mode: issue with stations in the power save mode
- Original Message > From: Bernhard Schmidt > To: PseudoCylon ; Alexander Zagrebin > > Cc: freebsd-net@freebsd.org > Sent: Tue, February 8, 2011 2:52:53 AM > Subject: Re: if_run in hostap mode: issue with stations in the power save mode > > > > > The patch is attached. (diff to HEAD) Bit long, just because there is > > a couple of new call back functions to avoid LOR. > > Thank you! > > I've combined both patches (see attachment), if I get an ACK from both > of you I'll try get this into the tree ASAP. > > -- > Bernhard > No objection from me. Thanks AK ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Slow Intel 10GbE CX4 adapter behaviour
Hi, we're a medium sized ISP that need to pass all incoming user traffic through a Intel Server Systems FreeBSD PC and its dummynet pipes. Up until yesterday it had two 1 gb em cards, one for input, one for output. As we were approaching the bandwidth limitation we switched the cards for a two-port Intel 10GbE CX4 PCI-E adapter. With the then used FreeBSD 7.2 and the built-in FreeBSD ixgbe driver 1.7.3 (IIRC) it was very slow, and at only about 300-400 mbps load (~30-50 IP kpps) the internet access was very slow. Also, there were many "IP fragmentation failed" errors (1-30 kpps in "systat -ip"). So I decided to source-upgrade the world to 8.3-RC3 (ixgbe 2.3.8). Late in the night yesterday I didn't have enough opportunity to test the newer FreeBSD under load, but from the time we did and I know, the same slowness started happening at about 300-400 mbps load. There are no more fragmentation failed errors. No evident drops as per "netstat -s | fgrep drop". Only the speed is slooow. Even the ssh console lags a bit. Both ix0 and ix1 are configured at their default settings. Then I read something about the number of ixgbe device descriptors (hw.ixgbe.txd & hw.ixgbe.rxd) being set low at 256 by default, with up to 4096 permittable. But after some grepping on the source tree I saw that contrary to what the old docs say they are both set to an optimal value: /sys/dev/ixgbe/ixgbe.c: /* ** Number of TX descriptors per ring, ** setting higher than RX as this seems ** the better performing choice. */ static int ixgbe_txd = PERFORM_TXD; TUNABLE_INT("hw.ixgbe.txd", &ixgbe_txd); /* Number of RX descriptors per ring */ static int ixgbe_rxd = PERFORM_RXD; TUNABLE_INT("hw.ixgbe.rxd", &ixgbe_rxd) /sys/dev/ixgbe/ixgbe.h: /* * TxDescriptors Valid Range: 64-4096 Default Value: 256 This value is the * number of transmit descriptors allocated by the driver. Increasing this * value allows the driver to queue more transmits. Each descriptor is 16 * bytes. Performance tests have show the 2K value to be optimal for top * performance. */ #define DEFAULT_TXD 1024 #define PERFORM_TXD 2048 #define MAX_TXD 4096 #define MIN_TXD 64 So, here's my kernel config for your viewing pleasure: include GENERIC ident SHAPER nomakeoptions DEBUG nooptions COMPAT_FREEBSD4 # Compatible with FreeBSD4 nooptions COMPAT_FREEBSD5 # Compatible with FreeBSD5 nooptions COMPAT_FREEBSD6 # Compatible with FreeBSD6 options COMPAT_FREEBSD7 # Compatible with FreeBSD7 nooptions COMPAT_FREEBSD32# Compatible with i386 binaries nooptions INET6 # IPv6 communications protocols options ZERO_COPY_SOCKETS # XXX 20091227: em(4) wants DEVICE_POLLING off for its fast-interrupts to work #optionsDEVICE_POLLING options IPFIREWALL #firewall options IPFIREWALL_DEFAULT_TO_ACCEPT#allow everything by default Here's /etc/sysctl.conf: net.inet.ip.fw.verbose=0 kern.ipc.shmall=65536 kern.ipc.shmmax=268435456 kern.ipc.semmap=1024 kern.ipc.nmbclusters=11 net.inet.ip.fastforwarding=1 net.inet.ip.dummynet.io_fast=1 #XXX no longer used in 8.3?? net.isr.direct=0 net.inet.ip.intr_queue_maxlen=5000 hw.intr_storm_threshold=9000 #dev.em.0.rx_processing_limit=-1 # device not used any more Any tips? I'll be happy to try and add some more info upon request. Thanks. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Slow Intel 10GbE CX4 adapter behaviour
On 9 Feb, 2011, at 07:29 , rihad wrote: > Hi, we're a medium sized ISP that need to pass all incoming user traffic > through a Intel Server Systems FreeBSD PC and its dummynet pipes. Up until > yesterday it had two 1 gb em cards, one for input, one for output. As we were > approaching the bandwidth limitation we switched the cards for a two-port > Intel 10GbE CX4 PCI-E adapter. With the then used FreeBSD 7.2 and the > built-in FreeBSD ixgbe driver 1.7.3 (IIRC) it was very slow, and at only > about 300-400 mbps load (~30-50 IP kpps) the internet access was very slow. > Also, there were many "IP fragmentation failed" errors (1-30 kpps in "systat > -ip"). So I decided to source-upgrade the world to 8.3-RC3 (ixgbe 2.3.8). > Late in the night yesterday I didn't have enough opportunity to test the > newer FreeBSD under load, but from the time we did and I know, the same > slowness started happening at about 300-400 mbps load. There are no more > fragmentation failed errors. No evident drops as per "netstat -s | fgrep > drop". Only the speed is slooow. Even the ssh console lags a bit. Both ix0 > and ix1 are configured at their default settings. > > Then I read something about the number of ixgbe device descriptors > (hw.ixgbe.txd & hw.ixgbe.rxd) being set low at 256 by default, with up to > 4096 permittable. But after some grepping on the source tree I saw that > contrary to what the old docs say they are both set to an optimal value: > > /sys/dev/ixgbe/ixgbe.c: > /* > ** Number of TX descriptors per ring, > ** setting higher than RX as this seems > ** the better performing choice. > */ > static int ixgbe_txd = PERFORM_TXD; > TUNABLE_INT("hw.ixgbe.txd", &ixgbe_txd); > > /* Number of RX descriptors per ring */ > static int ixgbe_rxd = PERFORM_RXD; > TUNABLE_INT("hw.ixgbe.rxd", &ixgbe_rxd) > > > /sys/dev/ixgbe/ixgbe.h: > /* > * TxDescriptors Valid Range: 64-4096 Default Value: 256 This value is the > * number of transmit descriptors allocated by the driver. Increasing this > * value allows the driver to queue more transmits. Each descriptor is 16 > * bytes. Performance tests have show the 2K value to be optimal for top > * performance. > */ > #define DEFAULT_TXD 1024 > #define PERFORM_TXD 2048 > #define MAX_TXD 4096 > #define MIN_TXD 64 > > > > So, here's my kernel config for your viewing pleasure: > include GENERIC > > ident SHAPER > > nomakeoptions DEBUG > > nooptions COMPAT_FREEBSD4 # Compatible with FreeBSD4 > nooptions COMPAT_FREEBSD5 # Compatible with FreeBSD5 > nooptions COMPAT_FREEBSD6 # Compatible with FreeBSD6 > options COMPAT_FREEBSD7 # Compatible with FreeBSD7 > nooptions COMPAT_FREEBSD32# Compatible with i386 binaries > > nooptions INET6 # IPv6 communications protocols > options ZERO_COPY_SOCKETS > # XXX 20091227: em(4) wants DEVICE_POLLING off for its fast-interrupts to work > #optionsDEVICE_POLLING > options IPFIREWALL #firewall > options IPFIREWALL_DEFAULT_TO_ACCEPT#allow everything by default > > > Here's /etc/sysctl.conf: > > net.inet.ip.fw.verbose=0 > > kern.ipc.shmall=65536 > kern.ipc.shmmax=268435456 > kern.ipc.semmap=1024 > kern.ipc.nmbclusters=11 > > net.inet.ip.fastforwarding=1 > net.inet.ip.dummynet.io_fast=1 #XXX no longer used in 8.3?? > net.isr.direct=0 > net.inet.ip.intr_queue_maxlen=5000 > > hw.intr_storm_threshold=9000 > #dev.em.0.rx_processing_limit=-1 # device not used any more > > > > > Any tips? I'll be happy to try and add some more info upon request. > > > Thanks. > ___ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" I don't know if it's the same issue, but I had severe performance issues with ixgbe cards until I disable LRO (ifconfig ix0 -lro). That was on 7.2 too. Regards, Nikolay___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: igb driver RX (was TX) hangs when out of mbuf clusters
Hmmm, well so much for that theory :) Jack On Tue, Feb 8, 2011 at 4:06 PM, Karim Fodil-Lemelin < fodillemlinka...@gmail.com> wrote: > > > 2011/2/8 Jack Vogel > > >> I have been following this, and thinking about it. I still am working from >> a theoretical >> standpoint, but based on a patch I got quite a long time back and never >> quite groked, >> I believe now that I might have a solution. >> >> The original PR and patch was kern/150516 from Beezar Liu, I was never >> quite comfortable >> with the code changes, nor convinced that it was a real issue and not a >> misunderstanding. >> However I think now that this very report might be behind what we are >> seeing today. I have >> a slightly different approach to solving it, of course it remains to be >> seen if it handles it >> properly. >> >> Please try the patch I've attached, I'm open to further correction or >> polishing of the >> changes. And thanks to Beezar for his original report and changes, this is >> not for em, >> but if this eliminates the problem its clearly needed in all drivers. >> >> Jack >> >> >> Hi Jack, > > Thanks for your help. I tried your patch and it didn't work so I added a > couple of printf to see if the added code was getting hit: > > --- a/freebsd/sys/dev/e1000/if_igb.c > --More--(byte 1253)+++ b/freebsd/sys/dev/e1000/if_igb.c > @@ -612,7 +612,7 @@ igb_attach(device_t dev) > device_get_nameunit(dev)); > > INIT_DEBUGOUT("igb_attach: end"); > - > + printf("this driver has a patch from Jack Vogel\n"); > return (0); > > err_late: > @@ -4131,6 +4131,7 @@ igb_rxeof(struct igb_queue *que, int count, int > *done) > struct mbuf *sendmp, *mh, *mp; > struct igb_rx_buf *rxbuf; > u16 hlen, plen, hdr, vtag; > + int commit; > booleop = FALSE; > > cur = &rxr->rx_base[i]; > @@ -4255,10 +4256,23 @@ next_desc: > bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map, > BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE); > > + commit = i; /* capture the old index */ > + > /* Advance our pointers to the next descriptor. */ > if (++i == adapter->num_rx_desc) > i = 0; > /* > + ** Sanity test for ring full, if this > + ** happens we need to refresh immediately > + ** or refresh may deadlock. > + */ > + if (i == rxr->next_to_refresh) { > + igb_refresh_mbufs(rxr, commit); > + printf("igb_refresh_mbufs called with commit %d\n", > commit); > + processed = 0; > + } > + > + /* > ** Send to the stack or LRO > */ > if (sendmp != NULL) { > > Here is the results: > > # dmesg | grep Vogel > this driver has a patch from Jack Vogel > this driver has a patch from Jack Vogel > > # netstat -m > 60453/52707/113160 mbufs in use (current/cache/total) > 48416/51584/10/10 mbuf clusters in use (current/cache/total/max) > 2894/690 mbuf+clusters out of packet secondary zone in use (current/cache) > 11946/854/12800/12800 4k (page size) jumbo clusters in use > (current/cache/total/max) > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) > 164834K/119760K/284595K bytes allocated to network (current/cache/total) > 0/339/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > 0/4/6656 sfbufs in use (current/peak/max) > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 0 requests for I/O initiated by sendfile > 0 calls to protocol drain routines > # dmesg | grep commit > > At this point RX has hung. > > Somehow the check (i == rxr->next_to_refresh) is never true in this case. > Also, I did read kern/150516 and couldn't wrap my head around the patch for > the em driver that Beezar Liu suggested. > > Regards, > > Karim. > > ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"