Re: nmbclusters: how do we want to fix this for 8.3 ?

2012-02-23 Thread Fabien Thomas

Le 22 févr. 2012 à 22:51, Jack Vogel a écrit :

> On Wed, Feb 22, 2012 at 1:44 PM, Luigi Rizzo  wrote:
> 
>> On Wed, Feb 22, 2012 at 09:09:46PM +, Ben Hutchings wrote:
>>> On Wed, 2012-02-22 at 21:52 +0100, Luigi Rizzo wrote:
>> ...
 I have hit this problem recently, too.
 Maybe the issue mostly/only exists on 32-bit systems.
>>> 
>>> No, we kept hitting mbuf pool limits on 64-bit systems when we started
>>> working on FreeBSD support.
>> 
>> ok never mind then, the mechanism would be the same, though
>> the limits (especially VM_LIMIT) would be different.
>> 
 Here is a possible approach:
 
 1. nmbclusters consume the kernel virtual address space so there
   must be some upper limit, say
 
VM_LIMIT = 256000 (translates to 512MB of address space)
 
 2. also you don't want the clusters to take up too much of the
>> available
   memory. This one would only trigger for minimal-memory systems,
   or virtual machines, but still...
 
MEM_LIMIT = (physical_ram / 2) / 2048
 
 3. one may try to set a suitably large, desirable number of buffers
 
TARGET_CLUSTERS = 128000
 
 4. and finally we could use the current default as the absolute minimum
 
MIN_CLUSTERS = 1024 + maxusers*64
 
 Then at boot the system could say
 
nmbclusters = min(TARGET_CLUSTERS, VM_LIMIT, MEM_LIMIT)
 
nmbclusters = max(nmbclusters, MIN_CLUSTERS)
 
 
 In turn, i believe interfaces should do their part and by default
 never try to allocate more than a fraction of the total number
 of buffers,
>>> 
>>> Well what fraction should that be?  It surely depends on how many
>>> interfaces are in the system and how many queues the other interfaces
>>> have.
>> 
 if necessary reducing the number of active queues.
>>> 
>>> So now I have too few queues on my interface even after I increase the
>>> limit.
>>> 
>>> There ought to be a standard way to configure numbers of queues and
>>> default queue lengths.
>> 
>> Jack raised the problem that there is a poorly chosen default for
>> nmbclusters, causing one interface to consume all the buffers.
>> If the user explicitly overrides the value then
>> the number of cluster should be what the user asks (memory permitting).
>> The next step is on devices: if there are no overrides, the default
>> for a driver is to be lean. I would say that topping the request between
>> 1/4 and 1/8 of the total buffers is surely better than the current
>> situation. Of course if there is an explicit override, then use
>> it whatever happens to the others.
>> 
>> cheers
>> luigi
>> 
> 
> Hmmm, well, I could make the default use only 1 queue or something like
> that,
> was thinking more of what actual users of the hardware would want.
> 

I think this is more reasonable to setup interface with one queue.
Even if the cluster does not hit the max you will end up with unbalanced 
setting that
let very low mbuf count for other uses.


> After the installed kernel is booted and the admin would do whatever post
> install
> modifications they wish it could be changed, along with nmbclusters.
> 
> This was why i sought opinions, of the algorithm itself, but also anyone
> using
> ixgbe and igb in heavy use, what would you find most convenient?
> 
> Jack
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"



Re: Some performance measurements on the FreeBSD network stack

2012-04-24 Thread Fabien Thomas
>> 
> 
> I have a patch that has been sitting around for a long time due to
> review cycle latency that caches a pointer to the rtentry (and
> llentry) in the the inpcb. Before each use the rtentry is checked
> against a generation number in the routing tree that is incremented on
> every routing table update.

Hi Kip,

Is there a public location for the patch ?
What can be done to speedup the commit: testing ?

Fabien



Re: ixgbe & if_igb RX ring locking

2012-10-19 Thread Fabien Thomas

Le 18 oct. 2012 à 20:09, Jack Vogel a écrit :

> On Thu, Oct 18, 2012 at 6:20 AM, Andre Oppermann wrote:
> 
>> On 13.10.2012 20:22, Luigi Rizzo wrote:
>> 
>>> On Sat, Oct 13, 2012 at 09:49:21PM +0400, Alexander V. Chernikov wrote:
>>> 
 Hello list!
 
 
 Packets receiving code for both ixgbe and if_igb looks like the
 following:
 
 
 ixgbe_msix_que
 
 -- ixgbe_rxeof()
{
   IXGBE_RX_LOCK(rxr);
 while
 {
get_packet;
 
-- ixgbe_rx_input()
   {
  ++ IXGBE_RX_UNLOCK(rxr);
  if_input(packet);
  ++ IXGBE_RX_LOCK(rxr);
   }
 
 }
   IXGBE_RX_UNLOCK(rxr);
 }
 
 Lines marked with ++ appeared in r209068(igb) and r217593(ixgbe).
 
 These lines probably do LORs masking (if any) well.
 However, such change introduce quite significant performance drop:
 
 On my routing setup (nearly the same from previous -Intel 10G thread in
 -net) adding lock/unlock causes 2.8MPPS decrease to 2.3MPPS which is
 nearly 20%.
 
>>> 
>>> one option could be (same as it is done in the timer
>>> routine in dummynet) to build a list of all the packets
>>> that need to be sent to if_input(), and then call
>>> if_input with the entire list outside the lock.
>>> 
>>> It would be even easier if we modify the various *_input()
>>> routines to handle a list of mbufs instead of just one.
>>> 
>> 
>> Not really. You'd just run into tons of layering complexity.
>> Somewhere the decomposition and serialization has to be done.
>> 
>> Perhaps the right place is to dequeue a batch of packets from
>> the HW ring and then have a task/thread send it up the stack
>> one by one.
>> 
> 
> I was thinking about how to code this, something like what I did with
> the refresh routine, in any case I will experiment with it.

This modified version for mq polling create a list of packet that are injected 
later (mc is the list).
http://www.gitorious.org/~fabient/freebsd/fabient-freebsd/blobs/work/pollng_mq_stable_8/sys/dev/ixgbe/ixgbe.c#line4615


> 
> Jack
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [patch] reducing arp locking

2012-11-09 Thread Fabien Thomas

Le 8 nov. 2012 à 11:25, Alexander V. Chernikov a écrit :

> On 08.11.2012 14:24, Andre Oppermann wrote:
>> On 08.11.2012 00:24, Alexander V. Chernikov wrote:
>>> Hello list!
>>> 
>>> Currently we need to acquire 2 read locks to perform simple 6-byte
>>> copying from arp record to packet
>>> ethernet header.
>>> 
>>> It seems that acquiring lle lock for fast path (main traffic flow) is
>>> not necessary even with
>>> current code.
>>> 
>>> My tests shows ~10% improvement with this patch applied.
>>> 
>>> If nobody objects I plan to commit this change at the end of next week.
>> 
>> This is risky and prone to race conditions.  The copy of the MAC address
>> should be done while the table read lock is held to protect against the
> It is done exactly as you say: table read lock is held.

How do you protect from entry update if i've a ref to the entry ?
You can end up doing bcopy of a partial mac address.
la_preempt modification is also write access to an unlocked structure.


> 
>> entry going away.  You can either return with table lock held and drop
>> it after the copy, or you could a modified lookup function that takes a
>> pointer for the copy destination, do the copy with the read lock, and then
>> return.  If no entry is found an error is returned and obviously no copy
>> is done.
>> 
> 
> 
> -- 
> WBR, Alexander
> 
> 
> ___
> freebsd-hack...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [patch] reducing arp locking

2012-11-09 Thread Fabien Thomas

Le 9 nov. 2012 à 10:05, Alexander V. Chernikov a écrit :

> On 09.11.2012 12:51, Fabien Thomas wrote:
>> 
>> Le 8 nov. 2012 à 11:25, Alexander V. Chernikov a écrit :
>> 
>>> On 08.11.2012 14:24, Andre Oppermann wrote:
>>>> On 08.11.2012 00:24, Alexander V. Chernikov wrote:
>>>>> Hello list!
>>>>> 
>>>>> Currently we need to acquire 2 read locks to perform simple 6-byte
>>>>> copying from arp record to packet
>>>>> ethernet header.
>>>>> 
>>>>> It seems that acquiring lle lock for fast path (main traffic flow) is
>>>>> not necessary even with
>>>>> current code.
>>>>> 
>>>>> My tests shows ~10% improvement with this patch applied.
>>>>> 
>>>>> If nobody objects I plan to commit this change at the end of next week.
>>>> 
>>>> This is risky and prone to race conditions.  The copy of the MAC address
>>>> should be done while the table read lock is held to protect against the
>>> It is done exactly as you say: table read lock is held.
>> 
>> How do you protect from entry update if i've a ref to the entry ?
>> You can end up doing bcopy of a partial mac address.
> I see no problems in copying incorrect mac address in that case:
> if host mac address id updated, this is, most likely, another host, and 
> several packets being lost changes nothing.

Sending packet to a bogus mac address is not really nothing :)

> 
> However, there can be some realistic scenario where this can be the case (L2 
> load balancing/failover). I'll update in_arpinput() to do lle 
> removal/insertion in that case.
> 
>> la_preempt modification is also write access to an unlocked structure.
> This one changes nothing:
> current code does this under _read_ lock.

Under the table lock not the entry lock ?
Table lock is here to protect the table if I've understood the code correctly.
If i get an exclusive reference to the entry you will end up reading and 
writing to the entry without any lock.

> 
>> 
>> 
>>> 
>>>> entry going away.  You can either return with table lock held and drop
>>>> it after the copy, or you could a modified lookup function that takes a
>>>> pointer for the copy destination, do the copy with the read lock, and then
>>>> return.  If no entry is found an error is returned and obviously no copy
>>>> is done.
>>>> 
>>> 
>>> 
>>> --
>>> WBR, Alexander
>>> 
>>> 
>>> ___
>>> freebsd-hack...@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>>> To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
>> 
>> 
> 

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [patch] reducing arp locking

2012-11-09 Thread Fabien Thomas

Le 9 nov. 2012 à 12:18, Alexander V. Chernikov a écrit :

> On 09.11.2012 13:59, Fabien Thomas wrote:
>> 
>> Le 9 nov. 2012 à 10:05, Alexander V. Chernikov a écrit :
>> 
>>> On 09.11.2012 12:51, Fabien Thomas wrote:
>>>> 
>>>> Le 8 nov. 2012 à 11:25, Alexander V. Chernikov a écrit :
>>>> 
>>>>> On 08.11.2012 14:24, Andre Oppermann wrote:
>>>>>> On 08.11.2012 00:24, Alexander V. Chernikov wrote:
>>>>>>> Hello list!
>>>>>>> 
>>>>>>> Currently we need to acquire 2 read locks to perform simple 6-byte
>>>>>>> copying from arp record to packet
>>>>>>> ethernet header.
>>>>>>> 
>>>>>>> It seems that acquiring lle lock for fast path (main traffic flow) is
>>>>>>> not necessary even with
>>>>>>> current code.
>>>>>>> 
>>>>>>> My tests shows ~10% improvement with this patch applied.
>>>>>>> 
>>>>>>> If nobody objects I plan to commit this change at the end of next week.
>>>>>> 
>>>>>> This is risky and prone to race conditions.  The copy of the MAC address
>>>>>> should be done while the table read lock is held to protect against the
>>>>> It is done exactly as you say: table read lock is held.
>>>> 
>>>> How do you protect from entry update if i've a ref to the entry ?
>>>> You can end up doing bcopy of a partial mac address.
>>> I see no problems in copying incorrect mac address in that case:
>>> if host mac address id updated, this is, most likely, another host, and 
>>> several packets being lost changes nothing.
>> 
>> Sending packet to a bogus mac address is not really nothing :)
>> 
>>> 
>>> However, there can be some realistic scenario where this can be the case 
>>> (L2 load balancing/failover). I'll update in_arpinput() to do lle 
>>> removal/insertion in that case.
>>> 
>>>> la_preempt modification is also write access to an unlocked structure.
>>> This one changes nothing:
>>> current code does this under _read_ lock.
>> 
>> Under the table lock not the entry lock ?
> lle entry is read-locked while la_preempt is modified.
> 
>> Table lock is here to protect the table if I've understood the code 
>> correctly.
> Yes.
>> If i get an exclusive reference to the entry you will end up reading and 
>> writing to the entry without any lock.
> Yes. And the only single drawback in worst case can be sending a bit more 
> packets to right (but probably expired) MAC address.

Or partial copy of the new mac.
> 
> I'm talking about the following:
> ARP stack is just IP -> 6 bytes mapping, there is no reason to make it 
> unnecessary complicated like rte, with references being held by upper layer 
> stack. It does not contain interface pointer, etc..
> 
> We may need to r/w lock entry, but for 'control plane' code only.
> If one acquired exclusive lock and wants to change STATIC flag to non-static 
> or change lle address - this is simply wrong and has to be handled by 
> acquiring table wlock.
> 
> Current ARP code has some flaws like handling arp expiration, but this patch 
> doesn't change much here..

In in_arpinput only exclusive access to the entry is taken during the update no 
IF_AFDATA_LOCK that's why i was surprised.


;
> 
>> 
>>> 
>>>> 
>>>> 
>>>>> 
>>>>>> entry going away.  You can either return with table lock held and drop
>>>>>> it after the copy, or you could a modified lookup function that takes a
>>>>>> pointer for the copy destination, do the copy with the read lock, and 
>>>>>> then
>>>>>> return.  If no entry is found an error is returned and obviously no copy
>>>>>> is done.
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> WBR, Alexander
>>>>> 
>>>>> 
>>>>> ___
>>>>> freebsd-hack...@freebsd.org mailing list
>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>>>>> To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
>>>> 
>>>> 
>>> 
>> 
>> 
> 
> 
> -- 
> WBR, Alexander
> 
> 

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [patch] reducing arp locking

2012-11-09 Thread Fabien Thomas

Le 9 nov. 2012 à 17:43, Ingo Flaschberger a écrit :

> Am 09.11.2012 15:03, schrieb Fabien Thomas:
>> In in_arpinput only exclusive access to the entry is taken during the update 
>> no IF_AFDATA_LOCK that's why i was surprised.
> 
> what about this:

I'm not against optimizing but an API that seems clear (correct this if i'm 
wrong):
- one lock for list modification
- one RW lock for lle entry access
- one refcount for ptr unref

is now a lot more unclear and from my point of view dangerous.

My next question is why do we need a per entry lock if we use the table lock to 
protect entry access?

Fabien
 
> --
> --- /usr/src/sys/netinet/if_ether.c_org 2012-11-09 16:15:43.0 +
> +++ /usr/src/sys/netinet/if_ether.c 2012-11-09 16:16:37.0 +
> @@ -685,7 +685,7 @@
>flags |= LLE_EXCLUSIVE;
>IF_AFDATA_LOCK(ifp);
>la = lla_lookup(LLTABLE(ifp), flags, (struct sockaddr *)&sin);
> -   IF_AFDATA_UNLOCK(ifp);
> +
>if (la != NULL) {
>/* the following is not an error when doing bridging */
>if (!bridged && la->lle_tbl->llt_ifp != ifp && !carp_match) {
> @@ -697,12 +697,14 @@
>ifp->if_addrlen, (u_char *)ar_sha(ah), ":",
>ifp->if_xname);
>LLE_WUNLOCK(la);
> +   IF_AFDATA_UNLOCK(ifp);
>goto reply;
>}
>if ((la->la_flags & LLE_VALID) &&
>bcmp(ar_sha(ah), &la->ll_addr, ifp->if_addrlen)) {
>if (la->la_flags & LLE_STATIC) {
>LLE_WUNLOCK(la);
> +   IF_AFDATA_UNLOCK(ifp);
>if (log_arp_permanent_modify)
>log(LOG_ERR,
>"arp: %*D attempts to modify "
> @@ -725,6 +727,7 @@
> 
>if (ifp->if_addrlen != ah->ar_hln) {
>LLE_WUNLOCK(la);
> +   IF_AFDATA_UNLOCK(ifp);
>log(LOG_WARNING, "arp from %*D: addr len: new %d, "
>"i/f %d (ignored)\n", ifp->if_addrlen,
>(u_char *) ar_sha(ah), ":", ah->ar_hln,
> @@ -763,14 +766,19 @@
>la->la_numheld = 0;
>memcpy(&sa, L3_ADDR(la), sizeof(sa));
>LLE_WUNLOCK(la);
> +   IF_AFDATA_UNLOCK(ifp);
>for (; m_hold != NULL; m_hold = m_hold_next) {
>m_hold_next = m_hold->m_nextpkt;
>m_hold->m_nextpkt = NULL;
>(*ifp->if_output)(ifp, m_hold, &sa, NULL);
>}
> -   } else
> +   } else {
>LLE_WUNLOCK(la);
> -   }
> +   IF_AFDATA_UNLOCK(ifp);
> +}
> +   } else {
> +   IF_AFDATA_UNLOCK(ifp);
> +}
> reply:
>if (op != ARPOP_REQUEST)
>goto drop;
> --
> 
> Kind regards,
>Ingo Flaschberger
> 
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [patch] reducing arp locking

2012-11-12 Thread Fabien Thomas

Le 9 nov. 2012 à 19:55, Alexander V. Chernikov a écrit :

> On 09.11.2012 20:51, Fabien Thomas wrote:
>> 
>> Le 9 nov. 2012 à 17:43, Ingo Flaschberger a écrit :
>> 
>>> Am 09.11.2012 15:03, schrieb Fabien Thomas:
>>>> In in_arpinput only exclusive access to the entry is taken during the 
>>>> update no IF_AFDATA_LOCK that's why i was surprised.
> I'll update patch to reflect changes discussed in previous e-mails.
>>> 
>>> what about this:
>> 
>> I'm not against optimizing but an API that seems clear (correct this if i'm 
>> wrong):
>> - one lock for list modification
>> - one RW lock for lle entry access
>> - one refcount for ptr unref
>> 
>> is now a lot more unclear and from my point of view dangerous.
> 
> This can be changed/documented as the following:
> - table rW lock for list modification
> - table rW lock lle_addr, la_expire change
> - per-lle rw lock for refcount and other fields not used by 'main path' code

Yes that's fine if documented and if every access to lle_addr + la_expire is 
under the table lock.

>> 
>> My next question is why do we need a per entry lock if we use the table lock 
>> to protect entry access?
> Because there are other cases, like sending traffic to unresolved rte (arp 
> request send, but reply is not received, and we have to maintain packets 
> queue to that destination).
> 
> .. and it seems flags handling (LLE_VALID) should be done with more care.
>> 
>> Fabien
>>> 
>>> ___
>>> freebsd-net@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
>> 
>> ___
>> freebsd-net@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
>> 
> 
> 
> 
> -- 
> WBR, Alexander
> 

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: request for MFC of em/igb drivers

2010-11-23 Thread Fabien Thomas
That fix on ixgbe would also be great to commit on ixgbe before release.
This fix a crash on high packet load with bpf (mbuf freed behind bpf analysis).

Fabien



patch-ixgbe-bpfcrash
Description: Binary data
> 

> On 17.11.2010 23:39, Jack Vogel wrote:
>> Yes, everyone, I plan on updating all the drivers, there has been no
>> activity
>> because I've tracking down a couple bugs that are tough, involving days
>> of testing to reproduce. I know we're getting close and I appreciate any
>> reports like this before.
>> 
>> Stay tuned
>> 
>> Jack
> 
> Thanks for response. Do you play to MFC fixes before 8.2-RELEASE?
> We are in PRERELEASE state already :-)
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: lagg/lacp poor traffic distribution

2010-12-21 Thread Fabien Thomas
>>> Hi!
>>> 
>>> I've loaded router using two lagg interfaces in LACP mode.
>>> lagg0 has IP address and two ports (em0 and em1) and carry untagged frames.
>>> lagg1 has no IP address and has two ports (igb0 and igb1) and carry
>>> about 1000 dot-q vlans with lots of hosts in each vlan.
>>> 
>>> For lagg1, lagg distributes outgoing traffic over two ports just fine.
>>> For lagg0 (untagged ethernet segment with only 2 MAC addresses)
>>> less than 0.07% (54Mbit/s max) of traffic goes to em0
>>> and over 99.92% goes to em1, that's bad.
>>> 
>>> That's general traffic of several thousands of customers surfing the web,
>>> using torrents etc.  I've glanced over lagg/lacp sources if src/sys/net/
>>> and found nothing suspicious, it should extract and use srcIP/dstIP for 
>>> hash.
>>> 
>>> How do I debug this problem?
>>> 
>>> Eugene Grosbein
>>> ___
>>> freebsd-net@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
>> 
>> I had this problem with igb driver, and I found, that lagg selects
>> outgoing interface based on packet header flowid field if M_FLOWID field
>> is set. And in the igb driver code flowid is set as 
>> 
>> #if __FreeBSD_version >= 80
>> <--><--><-->rxr->fmp->m_pkthdr.flowid = que->msix;
>> <--><--><-->rxr->fmp->m_flags |= M_FLOWID;
>> #endif
>> 
>> The same thing in em driver with MULTIQUEUE 
>> 
>> That does not give enough number of flows to balance traffic well, so I
>> commented check in if_lagg.c
>> 
>> lagg_lb_start(struct lagg_softc *sc, struct mbuf *m)
>> {
>> <-->struct lagg_lb *lb = (struct lagg_lb *)sc->sc_psc;
>> <-->struct lagg_port *lp = NULL;
>> <-->uint32_t p = 0;
>> 
>> //<>if (m->m_flags & M_FLOWID)
>> //<><-->p = m->m_pkthdr.flowid;
>> //<>else
>> 
>> and with this change I have much better load distribution across interfaces.
>> 
>> Hope it helps.
> 
> You are perfectly right. By disabling flow usage I've obtained load sharing
> close to even (final patch follows). Two questions:
> 
> 1. Is it a bug or design problem?

How many queues have you with igb? If it's one it will explain why the flowid 
is bad for load balancing with lagg.
The problem is that flowid is good if the number of queue is = or a multiple of 
the number of lagg ports.

> 2. Will I get problems like packet reordering by permanently disabling
> usage of these flows in lagg(4)?
> 
> --- if_lagg.c.orig2010-12-20 22:53:21.0 +0600
> +++ if_lagg.c 2010-12-21 13:37:20.0 +0600
> @@ -168,6 +168,11 @@
> &lagg_failover_rx_all, 0,
> "Accept input from any interface in a failover lagg");
> 
> +int lagg_use_flows = 1;
> +SYSCTL_INT(_net_link_lagg, OID_AUTO, use_flows, CTLFLAG_RW,
> +&lagg_use_flows, 1,
> +"Use flows for load sharing");
> +
> static int
> lagg_modevent(module_t mod, int type, void *data)
> {
> @@ -1666,7 +1671,7 @@
>   struct lagg_port *lp = NULL;
>   uint32_t p = 0;
> 
> - if (m->m_flags & M_FLOWID)
> + if (lagg_use_flows && (m->m_flags & M_FLOWID))
>   p = m->m_pkthdr.flowid;
>   else
>   p = lagg_hashmbuf(m, lb->lb_key);
> --- if_lagg.h.orig2010-12-21 16:34:35.0 +0600
> +++ if_lagg.h 2010-12-21 16:35:27.0 +0600
> @@ -242,6 +242,8 @@
> int   lagg_enqueue(struct ifnet *, struct mbuf *);
> uint32_t  lagg_hashmbuf(struct mbuf *, uint32_t);
> 
> +extern int   lagg_use_flows;
> +
> #endif /* _KERNEL */
> 
> #endif /* _NET_LAGG_H */
> --- ieee8023ad_lacp.c.orig2010-12-21 16:36:09.0 +0600
> +++ ieee8023ad_lacp.c 2010-12-21 16:35:58.0 +0600
> @@ -812,7 +812,7 @@
>   return (NULL);
>   }
> 
> - if (m->m_flags & M_FLOWID)
> + if (lagg_use_flows && (m->m_flags & M_FLOWID))
>   hash = m->m_pkthdr.flowid;
>   else
>   hash = lagg_hashmbuf(m, lsc->lsc_hashkey);
> 
> Eugene Grosbein
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: lagg/lacp poor traffic distribution

2010-12-21 Thread Fabien Thomas

On Dec 21, 2010, at 3:00 PM, Eugene Grosbein wrote:

> On 21.12.2010 19:11, Fabien Thomas wrote:
> 
>>>> I had this problem with igb driver, and I found, that lagg selects
>>>> outgoing interface based on packet header flowid field if M_FLOWID field
>>>> is set. And in the igb driver code flowid is set as 
>>>> 
>>>> #if __FreeBSD_version >= 80
>>>> <--><--><-->rxr->fmp->m_pkthdr.flowid = que->msix;
>>>> <--><--><-->rxr->fmp->m_flags |= M_FLOWID;
>>>> #endif
>>>> 
>>>> The same thing in em driver with MULTIQUEUE 
>>>> 
>>>> That does not give enough number of flows to balance traffic well, so I
>>>> commented check in if_lagg.c
>>>> 
>>>> lagg_lb_start(struct lagg_softc *sc, struct mbuf *m)
>>>> {
>>>> <-->struct lagg_lb *lb = (struct lagg_lb *)sc->sc_psc;
>>>> <-->struct lagg_port *lp = NULL;
>>>> <-->uint32_t p = 0;
>>>> 
>>>> //<>if (m->m_flags & M_FLOWID)
>>>> //<><-->p = m->m_pkthdr.flowid;
>>>> //<>else
>>>> 
>>>> and with this change I have much better load distribution across 
>>>> interfaces.
>>>> 
>>>> Hope it helps.
>>> 
>>> You are perfectly right. By disabling flow usage I've obtained load sharing
>>> close to even (final patch follows). Two questions:
>>> 
>>> 1. Is it a bug or design problem?
>> 
>> How many queues have you with igb? If it's one it will explain why the 
>> flowid is bad for load balancing with lagg.
> 
> How do I know? I've read igb(4) manual page and found no words
vmstat -i will show the queue (intr for the queue) normally it's the number of 
CPU available.



> about queues within igb, nor I have any knowledge about them.
> 
>> The problem is that flowid is good if the number of queue is = or a multiple 
>> of the number of lagg ports.
> 
> Now I see, thanks.
> 
> Eugene Grosbein

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: lagg/lacp poor traffic distribution

2010-12-21 Thread Fabien Thomas

On Dec 21, 2010, at 3:48 PM, Eugene Grosbein wrote:

> On 21.12.2010 20:41, Fabien Thomas wrote:
> 
>>>>> 1. Is it a bug or design problem?
>>>> 
>>>> How many queues have you with igb? If it's one it will explain why the 
>>>> flowid is bad for load balancing with lagg.
>>> 
>>> How do I know? I've read igb(4) manual page and found no words
>> vmstat -i will show the queue (intr for the queue) normally it's the number 
>> of CPU available.
> 
> # vmstat -i
> interrupt  total   rate
> irq5: uart28  0
> irq18: ehci0 uhci5+2  0
> irq19: uhci2 uhci4+ 2182  0
> irq23: uhci3 ehci1   124  0
> cpu0: timer 39576224   1993
> irq256: em0:rx 0   115571349   5822
> irq257: em0:tx 0   136632905   6883
> irq259: em1:rx 0   115829181   5835
> irq260: em1:tx 0   138838991   6994
> irq262: igb0:que 0 157354922   7927
> irq263: igb0:que 1577369 29
> irq264: igb0:que 2280207 14
> irq265: igb0:que 3241826 12
> irq266: igb0:link  2  0
> irq267: igb1:que 0 164620363   8293
> irq268: igb1:que 1238678 12
> irq269: igb1:que 2248478 12
> irq270: igb1:que 3762453 38
> irq271: igb1:link  3  0
> cpu2: timer 39576052   1993
> cpu3: timer 39576095   1993
> cpu1: timer 39575913   1993
> Total  989503327  49849
> 
> It seems I have four queues per igb card but only one of them works?

Yes.

Jack will certainly confirm but it seems that RSS hash does not seems to take 
vlan in account and default to queue0 ?


> 
> Eugene Grosbein

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: lagg/lacp poor traffic distribution

2010-12-23 Thread Fabien Thomas

On Dec 22, 2010, at 6:55 PM, Eugene Grosbein wrote:

> On 21.12.2010 21:57, Fabien Thomas wrote:
> 
>>> irq262: igb0:que 0 157354922   7927
>>> irq263: igb0:que 1577369 29
>>> irq264: igb0:que 2280207 14
>>> irq265: igb0:que 3241826 12
>>> irq266: igb0:link  2  0
>>> irq267: igb1:que 0 164620363   8293
>>> irq268: igb1:que 1238678 12
>>> irq269: igb1:que 2248478 12
>>> irq270: igb1:que 3762453 38
>>> irq271: igb1:link  3  0
>>> cpu2: timer 39576052   1993
>>> cpu3: timer 39576095   1993
>>> cpu1: timer 39575913   1993
>>> Total  989503327  49849
>>> 
>>> It seems I have four queues per igb card but only one of them works?
>> 
>> Yes.
>> 
>> Jack will certainly confirm but it seems that RSS hash does not seems to 
>> take vlan in account and default to queue0 ?
> 
> I've just read "Microsoft Receive-Side Scaling" documentation,
> http://download.microsoft.com/download/5/d/6/5d6eaf2b-7ddf-476b-93dc-7cf0072878e6/ndis_rss.doc
> 
> RSS defines that hash function may take IP and optionally port numbers only, 
> not vlan tags.
> In case of PPPoE-only traffic this card's ability to classify traffic voids.
> Then, unpatched lagg fails to share load over outgoing interface ports.
> 
> It seems, we really need sysctl disabling lagg's use of flows, don't we?

Yes I think that it is necessary to be able to disable it because he cant be 
always optimal.
One improvement to the queue count would be to hash the queue id before the 
modulo.

> 
> Eugene Grosbein

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Polling with multiqueue support

2011-02-24 Thread Fabien Thomas
Ryan post on new polling modification remembered me to post a quick note about 
polling with multiqueue support i've done some month ago.

The code is more intrusive and add a new handler for the queue. 
The handling of the network is nearly the same as deferred taskqueue in the 
drivers.
There is now two pass one for the receive and one for transmit (to flush 
pending transmit when all receive pass done).

The main gain is for packet forwarding with more than one interface.
The CPU can be easily reserved for application by binding a specific number of 
core to the network.
Performance is on par with interrupt on 10Gb or 1Gb interface and latency can 
be reduced by using higher HZ.
Most of the time using less core achieve higher global efficiency of the system 
by freeing CPU cycle and reducing contention.

Ex setup:

6 cores CPU, 2 ixgbe with 3 queue, 4 igb with 3 queue

with 3 cores for polling:
CPU0 will handle ixgbe0 queue 0, ixgbe1 queue 0, igb0 queue0, ...
CPU1 will handle ixgbe0 queue 1, ...
...

For those interested a test branch can be found here based on 8.x with ixgbe / 
igb and em modification:
http://www.gitorious.org/~fabient/freebsd/fabient-sandbox/commits/work/pollng_mq_stable_8
Extracted patchset here:
http://people.freebsd.org/~fabient/patch-poll_mq-20110202-stable_8
http://people.freebsd.org/~fabient/kern_poll.c-20110202 -> put to 
kern/kern_poll.c

--
Fabien Thomas




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Polling with multiqueue support

2011-02-24 Thread Fabien Thomas
On Feb 24, 2011, at 4:39 PM, Ryan Stone wrote:

> Ah, you've anticipated me.  This is just the kind of thing that I had
> in mind.  I have some comments:

Thanks for your feedback.
You pushed me from my laziness to explain the patchset on the ml.  :)

> 
> - Why allocate the poll_if_t in ether_poll_register_?  If you let the
> driver allocate it you don't have to worry about failure.  And the
> driver can embed it in its rx_ring so it doesn't have to worry about
> malloc'ing it anyway.  We can also put one it struct ifnet to preserve
> the traditional ether_poll_register interface.

Good point. I take it to my TODO list

> 
> - I'd personally prefer it if ether_poll_register_mq didn't require a
> struct ifnet to be passed in.  Nothing seems to use the ifnet anymore
> and here at $(WORK) we have some non-ifnets that actually register a
> polling handler.  Currently they just allocate a struct ifnet to make
> the polling code happy but I don't see any reason to require one.
To be sure to understand using context + queue id only as identifier for the mq 
part 
and grab the ifp from the context in the driver ?
That seems ok to me if it block case where you dont have ifp.

> 
> - Also, I don't quite understand why the separate TX step is necessary now.

It helps because TX is done only when every interface (on this taskqueue, 
cross taskqueue will require sync point) have processed packets to completion.
It can also help for the fairness between interface on the same taskqueue to 
rotate faster to the next if.
This is not required and can be used or not driver per driver (if not used 
everything can be done on RX).

There is also one fix pending for the compatibility interface: the packet per 
round need to be increased because there is no feedback loop on the old API.

Fabien


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Polling with multiqueue support

2011-02-24 Thread Fabien Thomas
Just an update to point to another old patch that enable flowtable on the 
forwarding path to increase performance (reduce contention) to be on par with 
Linux:
http://people.freebsd.org/~fabient/FreeBSDvsLinux10GB.png (forwarding 256B 
packets, % to line rate on 2x10Gb 82599 interface with 1xXeon W3680)
http://people.freebsd.org/~fabient/patch-flowtable-forward

Coupled with the polling code it perform quite well.

Last things a latency / polling overhead test result:
http://people.freebsd.org/~fabient/polllatency.png

User app is the time it take to run a CPU related benchmark (lower is better), 
net load is fixed as high but let some CPU available.
Freq is the HZ for polling or the measured intr frequency for that load. 
Latency is measured by Spirent STC.

Fabien




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Hello

2011-04-04 Thread Fabien Thomas
Hi Kip,

Feels good to see you again!

Fabien



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Hello

2011-04-04 Thread Fabien Thomas
Sorry for the noise, i've missed the dest...

>   Hi Kip,
> 
> Feels good to see you again!
> 
> Fabien
> 
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


m_getjcl and packet zone

2010-04-30 Thread Fabien Thomas
Hi all,

While doing some 10Gb benchmark i've found that m_getjcl does not benefit from 
the packet zone.
There is a ~ 80% increase in FPS when applying the following patch.

256B frame driver to driver / stable_8:
- 3 765 066 FPS
- 6 868 153 FPS with the patch applied.

Is there a good reason to not commit this ?

Fabien

diff --git a/sys/sys/mbuf.h b/sys/sys/mbuf.h
index 158edb4..95a44a4 100644
--- a/sys/sys/mbuf.h
+++ b/sys/sys/mbuf.h
@@ -523,6 +523,9 @@ m_getjcl(int how, short type, int flags, int size)
struct mbuf *m, *n;
uma_zone_t zone;

+   if (size == MCLBYTES)
+   return m_getcl(how, type, flags);
+
args.flags = flags;
args.type = type;

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: TCP loopback socket fusing

2010-09-14 Thread Fabien Thomas
Great,

This will maybe kill the long time debate about "my loopback is slow vs linux"
To have the best of both world what about a socket option to enable/disable 
fusing:
can be useful when you need to see some connection "packetized".

Fabien

On 13 sept. 2010, at 13:33, Andre Oppermann wrote:

> When a TCP connection via loopback back to localhost is made the whole
> send, segmentation and receive path (with larger packets though) is still
> executed.  This has some considerable overhead.
> 
> To short-circuit the send and receive sockets on localhost TCP connections
> I've made a proof-of-concept patch that directly places the data in the
> other side's socket buffer without doing any packetization and other protocol
> overhead (like UNIX domain sockets).  The connections setup (SYN, SYN-ACK,
> ACK) and shutdown are still handled by normal TCP segments via loopback so
> that firewalling stills works.  The actual payload data during the session
> won't be seen and the sequence numbers don't move other than for SYN and FIN.
> The sequence are remain valid though.  Obviously tcpdump won't see any data
> transfers either if the connection has fused sockets.
> 
> Preliminary testing (with WITNESS and INVARIANTS enabled) has shown stable
> operation and a rough doubling of the throughput on loopback connections.
> I've tested most socket teardown cases and it behaves fine.  I'm not entirely
> sure I've got all possible path's but the way it is integrated should properly
> defuse the sockets in all situations.
> 
> Testers and feedback wanted:
> 
> http://people.freebsd.org/~andre/tcp_loopfuse-20100913.diff
> 
> -- 
> Andre
> 
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: TCP loopback socket fusing

2010-09-14 Thread Fabien Thomas

On 14 sept. 2010, at 17:41, Andre Oppermann wrote:

> On 14.09.2010 11:18, Fabien Thomas wrote:
>> Great,
>> 
>> This will maybe kill the long time debate about "my loopback is slow vs 
>> linux"
>> To have the best of both world what about a socket option to enable/disable 
>> fusing:
>> can be useful when you need to see some connection "packetized".
> 
> A sysctl to that effect is already in the patch.
yes, i'm just wondering on a per connection setting.

> 
> -- 
> Andre
> 
>> Fabien
>> 
>> On 13 sept. 2010, at 13:33, Andre Oppermann wrote:
>> 
>>> When a TCP connection via loopback back to localhost is made the whole
>>> send, segmentation and receive path (with larger packets though) is still
>>> executed.  This has some considerable overhead.
>>> 
>>> To short-circuit the send and receive sockets on localhost TCP connections
>>> I've made a proof-of-concept patch that directly places the data in the
>>> other side's socket buffer without doing any packetization and other 
>>> protocol
>>> overhead (like UNIX domain sockets).  The connections setup (SYN, SYN-ACK,
>>> ACK) and shutdown are still handled by normal TCP segments via loopback so
>>> that firewalling stills works.  The actual payload data during the session
>>> won't be seen and the sequence numbers don't move other than for SYN and 
>>> FIN.
>>> The sequence are remain valid though.  Obviously tcpdump won't see any data
>>> transfers either if the connection has fused sockets.
>>> 
>>> Preliminary testing (with WITNESS and INVARIANTS enabled) has shown stable
>>> operation and a rough doubling of the throughput on loopback connections.
>>> I've tested most socket teardown cases and it behaves fine.  I'm not 
>>> entirely
>>> sure I've got all possible path's but the way it is integrated should 
>>> properly
>>> defuse the sockets in all situations.
>>> 
>>> Testers and feedback wanted:
>>> 
>>> http://people.freebsd.org/~andre/tcp_loopfuse-20100913.diff
>>> 
>>> --
>>> Andre
>>> 
>>> ___
>>> freebsd-net@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
>> 
>> 
>> 
> 

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-03 Thread Fabien Thomas
For your information we have mesured 730Kpps using pollng and  
fastforwarding
with 64bits frame without loss (<0.001% packet loss) on a Spirent  
Smarbits (Pentium D 2.8GHZ + 8xGig em)


You can find the code / and some performance report at : 
http://www.netasq.com/opensource/pollng-rev1-freebsd.tgz

The best performance / CPU cost ratio is to use 1 core only and the  
others core are free to do application processing.


Fabien


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Missing fix for fxp driver (FreeBSD 6.x)

2008-10-22 Thread Fabien Thomas

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/fxp/if_fxp.c.diff?r1=1.217.2.15;r2=1.217.2.16;f=h

This fix is really necessary (dealock of the interface in case of  
cluster shortage) and not commited in 6.x (but commited in RELENG_5  
RELENG_7 and HEAD)


Regards,
Fabien

Re: Missing fix for fxp driver (FreeBSD 6.x)

2008-10-22 Thread Fabien Thomas

Sorry for the noise...
I've made a mistake with my local patch vs cvsweb vs commit info that  
seems to fix the same problem but with another code.


If i've to rewrite my initial mail:

The fxp deadlock (fixed on head by this commit 
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/fxp/if_fxp.c.diff?r1=1.266;r2=1.267)
can be easily reproduced and maybe can be MFC in 6.4 and 7.1 ?

When the interface is in deadlock the only way to recover is to do a  
ifconfig up.


Fabien


http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/fxp/if_fxp.c.diff?r1=1.217.2.15;r2=1.217.2.16;f=h

This fix is really necessary (dealock of the interface in case of  
cluster shortage) and not commited in 6.x (but commited in RELENG_5  
RELENG_7 and HEAD)


Regards,
Fabien




Re: Interrupts + Polling mode (similar to Linux's NAPI)

2009-04-28 Thread Fabien Thomas


To share my results:

I have done at work modification to the polling code to do SMP polling  
(previously posted to this ml).


SMP polling (dynamic group of interface binded to CPU) does not  
significantly improve the throughput (lock contention seems to be the  
cause here).
The main advantage of polling with modern interface is not the PPS  
(which is nearly the same) but the global efficiency of the system  
when using multiple interfaces (which is the case for Firewall).
The best configuration we have found with FreeBSD 6.3 is to do polling  
on one CPU and keep the other CPU free for other processing. In this  
configuration the whole system
is more efficient than with interrupt where all the CPU are busy  
processing interrupt thread.


Regards,
Fabien



Re: Interrupts + Polling mode (similar to Linux's NAPI)

2009-04-28 Thread Fabien Thomas


Le 28 avr. 09 à 11:04, Paolo Pisati a écrit :


Fabien Thomas wrote:


To share my results:

I have done at work modification to the polling code to do SMP  
polling (previously posted to this ml).


SMP polling (dynamic group of interface binded to CPU) does not  
significantly improve the throughput (lock contention seems to be  
the cause here).
The main advantage of polling with modern interface is not the PPS  
(which is nearly the same) but the global efficiency of the system  
when using multiple interfaces (which is the case for Firewall).
The best configuration we have found with FreeBSD 6.3 is to do  
polling on one CPU and keep the other CPU free for other  
processing. In this configuration the whole system
is more efficient than with interrupt where all the CPU are busy  
processing interrupt thread.
out of curiosity: did you try polling on 4.x? i know it doesn't  
"support" SMP over there, but last time i tried polling on 7.x (or  
was it 6.x? i don't remember...)
i found it didn't gave any benefit, while switching the system to  
4.x showed a huge improvement.




yes rewriting the core polling code started at half because the  
polling code on 6.x and up perform badly (in our env) regarding  
performance.
today 4.x is unbeatable regarding network perf  (6.2 -> 7.0 at least,  
i need to do more test on 7_stable and 8).


the other half  of the work was to explore the SMP scaling of the  
polling code to gain what we loose with fine grained SMP kernel.



--

bye,
P.






Re: Interrupts + Polling mode (similar to Linux's NAPI)

2009-04-28 Thread Fabien Thomas


I have done at work modification to the polling

code to do SMP polling (previously posted to this ml).


SMP polling (dynamic group of interface binded to

CPU) does not significantly improve the throughput (lock
contention seems to be the cause here).

The main advantage of polling with modern

interface is not the PPS (which is nearly the same) but the
global efficiency of the system when using multiple
interfaces (which is the case for Firewall).

The best configuration we have found with FreeBSD

6.3 is to do polling on one CPU and keep the other CPU free
for other processing. In this configuration the whole system

is more efficient than with interrupt where all

the CPU are busy processing interrupt thread.

out of curiosity: did you try polling on 4.x? i know

it doesn't "support" SMP over there, but last
time i tried polling on 7.x (or was it 6.x? i don't
remember...)

i found it didn't gave any benefit, while

switching the system to 4.x showed a huge improvement.




yes rewriting the core polling code started at half because
the polling code on 6.x and up perform badly (in our env)
regarding performance.
today 4.x is unbeatable regarding network perf  (6.2 ->
7.0 at least, i need to do more test on 7_stable and 8).

the other half  of the work was to explore the SMP scaling
of the polling code to gain what we loose with fine grained
SMP kernel.


The problem with all of this "analysis" is that it assumes that SMP
coding scales intuitively; when the opposite is actually true.

What you fail to address is the basic fact that moderated interrupts
(ie holding off interrupts to a set number of ints/second) is exactly
the same as polling, as on an active system you'll get exactly X
interrupts per second at equal intervals. So all of this chatter about
polling being more efficient is simply bunk.


I agree with you with one interface. When you use ten interface it is  
not the case.



The truth is that polling requires additional overhead to the system  
while
interrupts do not. So if polling did better for you, its simply  
because

either

1)  The polling code in the driver is better

or

2) You tuned polling better than you tuned interrupt moderation.


Barney






___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"





Re: pf and vimage

2009-08-21 Thread Fabien Thomas

Thanks very useful!
Do you have an "official" page to look for update.
What do you think of putting it on the FreeBSD Wiki?

Fabien

Le 20 août 09 à 18:17, Julian Elischer a écrit :


there were some people looking at adding vnet support to pf.
Since we discussed it last, the rules of the game have
significantly changed for the better. With the addition
of some new facilitiesin FreeBSD, the work needed to virtualize
a module has significantly decreased.


The following doc gives the new rules..


August 17 2009
Julian Elischer

===
Vimage: what is it?
===

Vimage is a framework in the BSD kernel which allows a co-operating  
module
to operate on multiple independent instances of its state so that it  
can
participate in a virtual machine / virtual environment scenario. It  
refers
to a part of the Jail infrastructure in FreeBSD. For historical  
reasons
"Virtual network stack enabled jails"(1) are also known as "vimage  
enabled
jails"(2) or "vnet enabled jails"(3).  The currently correct term is  
the
latter, which is a contraction of the first. In the future other  
parts of
the system may be virtualized using the same technology and the term  
to

cover all such components would be VIMAGE enhanced modules.

The implementation approach taken by the vimage framework is a  
redefinition
of selected global state variables to evaluate to constructs that  
allow for
the virtualized state to be stored and resolved in appropriate  
instances of
'jail' specific container storage regions.  The code operating on  
virtualized
state has to conform to a set of rules described further below.  
Among other
things in order to allow for all the changes to be conditionally  
compilable.
i.e.  permitting the virtualized code to fall back to operation on  
global state.


The rest of this document will discuss NETWORK virtualization
though the concepts may be true in the future for other parts of the
system.

The most visible change throughout the existing code is typically  
replacement
of direct references to global variables with macros; foo_bar thus  
becomes
V_foo_bar.  V_foo_bar macros will resolve back to the foo_bar global  
in

default kernel builds, and alternatively to the logical equivalent of
some_base_pointer->_foo_bar for "options VIMAGE" kernel configs.

Prepending of "V_" prefixes to variable references helps in
visual discrimination between global and virtualized state.
It is also possible to use an alternative syntax, of VNET(foo_bar) to
achieve the same thing. The developers felt that V_foo_bar was less
visually distracting while still providing enough clues to the reader
that the variable is virtualized. In fact the V_foo_bar macro is
locally defined near the definition of foo_bar to be an alias for
VNET(foo_bar) so the two are not only equivalent, they are the same.

The framework also extends the sysctl infrastructure to support  
access to
virtualized state through introduction of the SYSCTL_VNET family of  
macros;
those also automatically fall back to their standard SYSCTL  
counterparts

in default kernel builds.

Transparent libkvm(3) lookups are provided to virtualized variables
which permits userland binaries such as netstat to operate unmodified
on "options VIMAGE" kernels, though this may have some security  
implications.


Vnets are associated with jails.  In 8.0, every process is  
associated with
a jail, usually the default (null) jail, and jails currently hang  
off of
a processes ucred.  This relationship defines a process's  
administrative
affinity to a vnet and thus indirectly to all of its state. All  
network

interfaces and sockets hold pointers back to their associated vnets.
This relationship is obviously entirely independent from proc->ucred- 
>jail
bindings.  Hence, when a process opens a socket, the socket will get  
bound
to a vnet instance hanging off of proc->ucred->jail->vnet, but once  
such a
socket->vnet binding gets established, it cannot be changed for the  
entire

socket lifetime.

The mapping of a from a thread to a vnet should always be done via the
TD_TO_VNET macro as the path may change in the future as we get more
experience with using the system.

Certain classes of network interfaces (Ethernet in particular) can be
reassigned from one vnet to another at any time.  By definition all  
vnets

are independent and can communicate only if they are explicitly
provided with communication paths. Currently mainly netgraph is used  
to

establish inter-vnet datapaths, though other paths are  being explored
such as the 'epair' back-to-back virtual interface pair, in which
the different sides may exist in different jails.

In network traffic processing the vnet affinity is defined either by  
the
inbound interface or by the socket / pcb -> vnet binding.  However,  
there

are many functions in the network stack that cannot implicitly fetch
the vnet context from their standard arguments.  Instead of explicitly
extending argument lists of

new version of polling for FreeBSD 6.x

2007-09-06 Thread Fabien THOMAS

Hi,

After many years of good services we will stop using FreeBSD 4.x :)
During my performance regression tests under FreeBSD 6.2 i've found  
that polling has lower performance than interrupt.
To solve that issue i've rewritten the core of polling to be more SMP  
ready.


You can find a summary of all my tests and the source code at the  
following address:

http://www.netasq.com/opensource/pollng-rev1-freebsd.tgz

Feel free to ask more detailed information if necessary and report  
any bugs / comments.


Fabien


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: new version of polling for FreeBSD 6.x

2007-09-08 Thread Fabien THOMAS

Hi,
This is really interesting work!  Reading the pdf file, it
seems forwarding performance on 6 and 7 is still much lower than
RELENG_4 ?  is that correct ?

---Mike


Thanks,

Yes it is still slower but as you can see in the graph (programming  
cost) just adding a mutex

drop the rate and we have some on the forwarding path.

We have beaten FreeBSD 4.x with pollng on 2 core with the best  
throughput at 7089Mb/s
but only when the test last 10s => maybe periodic task that get some  
CPU time.


One really interesting things is that FreeBSD 7.x can have great  
performance:
It performs slower than FreeBSD 6.x when using one CPU (4437 vs 5017)  
but

better when using 2 CPU (5214 vs 5026).

While reading the pdf i've discovered a mistake in the loss  
percentage: it is 0.001% and not 0.0001%.


Fabien



You can find a summary of all my tests and the source code at the
following address:
http://www.netasq.com/opensource/pollng-rev1-freebsd.tgz

Feel free to ask more detailed information if necessary and report
any bugs / comments.

Fabien


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net- 
[EMAIL PROTECTED]"



Mike Tancsa, Sentex communications http://www.sentex.net
Providing Internet Access since 1994
[EMAIL PROTECTED], (http://www.tancsa.com)



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: new version of polling for FreeBSD 6.x

2007-09-08 Thread Fabien THOMAS


Le 8 sept. 07 à 01:05, Andre Oppermann a écrit :


Mike Tancsa wrote:

On Thu, 6 Sep 2007 15:12:06 +0200, in sentex.lists.freebsd.net you
wrote:

After many years of good services we will stop using FreeBSD 4.x :)
During my performance regression tests under FreeBSD 6.2 i've  
found  that polling has lower performance than interrupt.
To solve that issue i've rewritten the core of polling to be more  
SMP  ready.

Hi,
This is really interesting work!  Reading the pdf file, it
seems forwarding performance on 6 and 7 is still much lower than
RELENG_4 ?  is that correct ?


Haven't tested RELENG_4 performance in a controlled environment and
thus can't answer the question directly.  However using fastforward
on 6 and 7 is key to good performance.  Without it you're stuck at
some 150-200kpps, perhaps 300kpps.  With it you get to 500-800kpps.


Using net.isr.direct is the key success and can get much better  
forwarding rate (intermediate queue kill the performance).
i aggree than using fastforwarding gets another big step because  
there is a lot less code than on the IP stack:


FreeBSD 6.2 using fastforward on 64bytes packets (L3 Mb/s):

pollng 1CPU:156
pollng 2CPU:123
intr:   144
pollng 1CPU fastfwd:221
pollng 2CPU fastfwd:270
intr fastfwd:   211

Fabien



--
Andre


---Mike
You can find a summary of all my tests and the source code at  
the  following address:

http://www.netasq.com/opensource/pollng-rev1-freebsd.tgz

Feel free to ask more detailed information if necessary and  
report  any bugs / comments.


Fabien


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net- 
[EMAIL PROTECTED]"


Mike Tancsa, Sentex communications http://www.sentex.net
Providing Internet Access since 1994
[EMAIL PROTECTED], (http://www.tancsa.com)
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net- 
[EMAIL PROTECTED]"





___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: new version of polling for FreeBSD 6.x

2007-09-08 Thread Fabien THOMAS

Haven't tested RELENG_4 performance in a controlled environment and
thus can't answer the question directly.  However using fastforward
on 6 and 7 is key to good performance.  Without it you're stuck at
some 150-200kpps, perhaps 300kpps.  With it you get to 500-800kpps.




To show that pps is mainly related to CPU freq (with high end  
component):


FreeBSD 6.2, packet size is 64bytes and value L3 Mb/s between the two  
only CPU change.


Xeon Woodcrest  2.333.001,287
pollng 1CPU:210 257 1,224   
pollng 2CPU:329 396 1,204
pollng 1CPU fastfwd:291 364 1,251
pollng 2CPU fastfwd:455 536 1,178

warn: this is not the same hardware than in the pdf.


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: new version of polling for FreeBSD 6.x

2007-09-08 Thread Fabien THOMAS


Hello Fabien,

Hello :)


1- I have noticed you are not using GENERIC config file, can you  
provide us more information on how your KERNCONF differs from  
GENERIC ?
I am pretty sure you have removed all the debug OPTIONs from the  
kernel, isn't it ?


It's a GENERIC kernel conf with polling and SMP activated.
With FreeBSD 7.x i've removed witness and invariant.


2- Did you get a chance to try Jumbo Frames and evaluate jumbo  
mbufs (I never success to make them work for me, did someone had  
more chance ?).
In any cases, PPS values are important for such tests. Andre is  
right, with Fast Forward you get the best perfs for such test.


No i havent done any jumbo frame test, maybe on next try.
Yes fastforward is better but my goal was to stress the IP stack so  
i've not integrated the fastforward result in the pdf. (but you can  
find some results in my reply to Andre).


3- Did you monitor the DUT to see where CPU cycles were spend in  
each test ?


Not during the real test. Profiling using hwpmc and LOCK_PROFILING  
have been done under the same condition but ignoring results.

hwpmc use the callgraph patch published by Joseph Koshy.



4- Have you considered measuring the time it takes for an interrupt  
to be handled and processed by the kernel Bottom/Top Half and ISR ?  
[1]


No


5- When I have performed some test using a Spirent SmartBits  
(SmartFlow) last summer I got the following results [2]. (For  
comparison purposes)


It's really difficult to compare.
For all my test i'm always using a reference hardware (not too  
powerful to be in the range of test tools).


6- In the test with Spirent Avalanche, you are using Lighttpd as  
webserver, did you enable kqueue ? how many workers ?
You are using HTTP 1.0 wo Keep-Alive, what was your  
net.inet.tcp.msl MIB's value ?


The goal of the application test was simple: i've pollng that works  
better than interrupt in all forwarding case but is my socket  
application will works better ?
For that i've just installed  the port with the default config (log  
disabled, default is one worker).


The result on this test show that polling is a great benefit to  
network application vs interrupt (near than two times more  
connections per seconds).




7- Polling is known to introduce higher latency, I would expect its  
benefits to be less in 7-CURRENT compared to 6.x since (Scott ?)

a FAST INTR Handler has been introduced around a year ago.


Yes it cost more in term of packet latency (where FreeBSD 4.11 was  
better than 6.x / 7.x in all mode)  but under high pps with interrupt  
the DUT is unresponsive (ithread, filter, em fastintr).


Nonetheless, what you report sounds like a perf regression...Have  
you filled a PR ? Luigi might have a good explaination here. :-D


For us polling have always worked better than interrupt under FreeBSD  
4.x, under FreeBSD 6.x it is not the case and under one of my  
application benchmark you can see that it really have a problem to  
sustain the load.


Behind the new model there is more than a regression fix: I think it  
will scale better to SMP and provide a good acceleration for packet  
inspection.




8- Lock profiling information were obtanied through KTR ?


no: LOCK_PROFILING



9- I was wondering if you have explored Intr CPU affinity [3] and  
IRQ Swizzling  [4] ?


No, but one idea is the hybrid mode used under Solaris (intr when  
load is low and polling).




Thanks for your efforts and your valuable contribution, best regards,
/Olivier


Kind regards,
Fabien


[1] http://xview.net/papers/interrupt_latency_scheduler/ (WARNING:  
Never find the time to finish that doc and publish it) &&

http://xview.net/research/RTBench/

[2] http://xview.net/papers/jumbo_mbufs_freebsd/

[3] http://www.intel.com/cd/ids/developer/asmo-na/eng/188935.htm? 
prn=3DY


[4] http://download.intel.com/design/chipsets/applnots/31433702.pdf
--
Olivier Warin





___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


pollng: pcap bench

2007-09-19 Thread Fabien THOMAS

Result of pcap benchmark requested by Vlad Galu:

Using polling is better.

Test setup:
---

netblast -- em|fxp -- pcap_bmark

under FreeBSD 6.2

Small product (fxp interface):
---

pollng:

Captured 30322.00 pps (total of 333542) and dropped 144
Captured 30358.45 pps (total of 333943) and dropped 219
Captured 30253.18 pps (total of 332785) and dropped 151
Captured 30276.82 pps (total of 333045) and dropped 88
Captured 30362.64 pps (total of 333989) and dropped 369

intr:

Captured 0.01 pps (total of 6877442) and dropped 6876215

completly stuck with intr mode so the period take more than 10s.

Large product (em interface):
---

pollng:
Captured 114669.09 pps (total of 1261360) and dropped 0
Captured 115263.18 pps (total of 1267895) and dropped 0
Captured 115226.45 pps (total of 1267491) and dropped 0
Captured 115003.64 pps (total of 1265040) and dropped 0

intr:

Captured 99091.91 pps (total of 1090011) and dropped 629467
Captured 105180.64 pps (total of 1156987) and dropped 617526
Captured 99722.36 pps (total of 1096946) and dropped 607367
Captured 104180.91 pps (total of 1145990) and dropped 626567



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


bge drivers does not work for 3COM 3C996-SX / 3C996B-T

2002-05-16 Thread Fabien THOMAS

I've some problems with bge driver with 3COM 3C996-SX fiber card and
3C996B-T copper card under -stable:

The fiber card is detected correctly but the link does not go up (i've
tested the same card between two Win2K and it works well).

The copper card is detected but the link goes up/down and sometimes lock the
machine (reboot is needed to restart) when i start a 'ping -i0 -q'.

Does someone experienced the same problems ?

for the missing splx: i think i've found a new one in bge_init:

static void
bge_init(xsc)
void *xsc;
{
struct bge_softc *sc = xsc;
struct ifnet *ifp;
u_int16_t *m;
int s;

s = splimp();

ifp = &sc->arpcom.ac_if;

if (ifp->if_flags & IFF_RUNNING)
--> missing splx ?
return;


Fabien



smime.p7s
Description: S/MIME Cryptographic Signature


Re[2]: bge driver issue

2002-06-18 Thread Fabien THOMAS

I've the same problems and i fixed partially the problem by bumping
the return ring count.

#define BGE_RETURN_RING_CNT 1024
->
#define BGE_RETURN_RING_CNT 2048

i dont think it is THE solution but it works better than before for me...

ppn> We have a Dell poweredge 2650 (successor to 2550).

ppn> We also saw the same problem with 4.5. I tried the current bge driver from 4.6
ppn> without success. The problem seems to be a size problem. When we ftp a small
ppn> file, things work fine. However, when we try a 18 Megabyte file, the ftp
ppn> hands and we see the problem descriped below. The linux system that came
ppn> with the hardware (from dell) worked fine.

ppn> BTW. This was occuring with a 100 Mbit link.

ppn> I have not been able to get any resolution on this. The only replies seem to
ppn> indicate that something is seriously broken with the bge driver.

ppn> Paul Fronberg
ppn> [EMAIL PROTECTED]

>> I have a dell poweredge 2550 with which I am having all sorts of
>> nasty network problems.
>> The network interface will just stop responding.
>> I get an error message like this:
>> Jun 18 08:19:38 shekondar /kernel: bge0: watchdog timeout -- resetting
>> 
>> This is using the broadcom 10/100/1000 NIC on the mother board, the
>> intel 10/100 has had similar issues but produces no log messages.
>> 
>> duplex and speed settings are forced on both the card and the switch.
>> sometimes the kernel reset will clear the fault but sometimes you
>> need to ifconfig down / up the interface to get it going again.
>> 
>> This box has been running fine for several weeks, it is only as we have
>> started to shift to production levels of traffic to it that it has started
>> this. Approx 30M bits/sec out and 12M bits/sec inbound.
>> 
>> There was a simple ipfw ruleset on the box but I have disable that
>> just now to see if it helps.
>> 
>> 
>> Googleing has given me people who report similar problems but no
>> solutions / work arounds.
>> 
>> Have anyone got any suggestions as to what to do next.
>> 
>> Colin
>> 
>> Here is the output of postconf
>> 
>> bge0@pci1:8:0:  class=0x02 card=0x00d11028 chip=0x164414e4 rev=0x12
>> hdr=0x00vendor   = 'Broadcom Corporation'
>> device   = 'BCM5700/1 Gigabit Ethernet Controller'
>> class= network
>> subclass = ethernet

ppn> To Unsubscribe: send mail to [EMAIL PROTECTED]
ppn> with "unsubscribe freebsd-net" in the body of the message



-- 
Cordialement,
 Fabienmailto:[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



bpf_tap problem with PKTHDR

2002-11-26 Thread Fabien THOMAS
Hi,

It seems there is a problem in the bpf_mtap code:

Actually the code assume in the seesent case that mbuf will have a pkthdr structure.

There is 2 problems here:
  + they did not check for that with (m_flag & M_PKTHDR)
  + at the upper level the caller forge fake mbuf that did not
  contain any pkthdr and did not initialize the m_flags field

what do you think about that ?
  
if_ethersubr.c case:

/* Check for a BPF tap */
if (ifp->if_bpf != NULL) {
struct m_hdr mh;

/* This kludge is OK; BPF treats the "mbuf" as read-only */
mh.mh_next = m;
mh.mh_data = (char *)eh;
mh.mh_len = ETHER_HDR_LEN;
bpf_mtap(ifp, (struct mbuf *)&mh);
}


bpf_mtap function:
/*
 * Incoming linkage from device drivers, when packet is in an mbuf chain.
 */
void
bpf_mtap(ifp, m)
struct ifnet *ifp;
struct mbuf *m;
{
struct bpf_if *bp = ifp->if_bpf;
struct bpf_d *d;
u_int pktlen, slen;
struct mbuf *m0;

pktlen = 0;
for (m0 = m; m0 != 0; m0 = m0->m_next)
pktlen += m0->m_len;

for (d = bp->bif_dlist; d != 0; d = d->bd_next) {
if (!d->bd_seesent && (m->m_pkthdr.rcvif == NULL))
continue;
++d->bd_rcount;
slen = bpf_filter(d->bd_filter, (u_char *)m, pktlen, 0);
if (slen != 0)
catchpacket(d, (u_char *)m, pktlen, slen, bpf_mcopy);
}
}

fabien



smime.p7s
Description: S/MIME Cryptographic Signature


Re: bpf_tap problem with PKTHDR

2002-11-26 Thread Fabien THOMAS

MB> I found similar problem with bpf flag BIOCSSEESENT. Here is simple 
MB> workaround: 

Yes its the same problem that i've found but it is not limited to the
ethernet case. virtually each bpf_mtap must be modified to add support
for a 'real' pkthdr.

fabien


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Recursive encapsulation could panic the Kernel

2002-12-17 Thread Fabien THOMAS
we can use a TTL associated to the mbuf that is decremented each time
we reach a possible 'point of loop'. the bad point is that we need a
new entry in the mbuf...

fabien

VJ> Hi,

VJ> With FreeBSD, there are many ways to create a recursive local encapsulation 
VJ> loop within the IPv4 and IPv6 stack. For example, this problem shows up when :
VJ>   - Netgraph with pptp is used or Netgraph with an ng_iface over UDP or any 
VJ> more complex Netgraph topologies...
VJ>   - gre interfaces
VJ>   - gif tunnels
VJ>   - ...

VJ> There is a simple local solution that is used by gif_output() that is not 
VJ> protected by any mutex:
VJ> /*
VJ>  * gif may cause infinite recursion calls when misconfigured.
VJ>  * We'll prevent this by introducing upper limit.
VJ>  * XXX: this mechanism may introduce another problem about
VJ>  *  mutual exclusion of the variable CALLED, especially if we
VJ>  *  use kernel thread.
VJ>  */
VJ> if (++called > max_gif_nesting) {
VJ> log(LOG_NOTICE,
VJ> "gif_output: recursively called too many times(%d)\n",
VJ> called);
VJ> m_freem(m);
VJ> error = EIO;/* is there better errno? */
VJ> goto end;
VJ> }

VJ> I am wondering if a more generic solution could be found, however I do not 
VJ> have any idea yet ;-(
VJ> I mean, is it possible to protect the kernel against any panic that could 
VJ> come from a mis-configuration of the routing tables ?

VJ> Regards,
VJ>   Vincent

VJ> To Unsubscribe: send mail to [EMAIL PROTECTED]
VJ> with "unsubscribe freebsd-net" in the body of the message


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



rl driver mac address probem

2003-03-06 Thread Fabien THOMAS
Hi,

I've a problem for setting the mac address for the rl driver
(ifconfig rl0 ether xx:xx:xx:xx:xx:xx).

The chip datasheet said that address can be read as a single byte but
it must be written 4 bytes at a time.

The patch correct the problem and have been grabbed from the linux
driver (but have a read overflow of 2 bytes).

is it possible that someone can review & commit the correction

fabien


--- if_rl.c.origThu Mar  6 11:32:58 2003
+++ if_rl.c Thu Mar  6 11:33:37 2003
@@ -1474,7 +1474,7 @@
struct rl_softc *sc = xsc;
struct ifnet*ifp = &sc->arpcom.ac_if;
struct mii_data *mii;
-   int s, i;
+   int s;
u_int32_t   rxcfg = 0;
 
s = splimp();
@@ -1487,9 +1487,9 @@
rl_stop(sc);
 
/* Init our MAC address */
-   for (i = 0; i < ETHER_ADDR_LEN; i++) {
-   CSR_WRITE_1(sc, RL_IDR0 + i, sc->arpcom.ac_enaddr[i]);
-   }
+   CSR_WRITE_1(sc, RL_EECMD, RL_EEMODE_WRITECFG);
+   CSR_WRITE_4(sc, RL_IDR0, *(u_int32_t *)&sc->arpcom.ac_enaddr[0]);
+   CSR_WRITE_4(sc, RL_IDR4, *(u_int32_t *)&sc->arpcom.ac_enaddr[4]);
 
/* Init the RX buffer pointer register. */
CSR_WRITE_4(sc, RL_RXADDR, vtophys(sc->rl_cdata.rl_rx_buf));


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message


ALTQ integration

2003-08-14 Thread Fabien THOMAS
What is the status of the ALTQ framework integration into FreeBSD ?

OpenBSD have native support but i think the merge with pf is a bad idea 
that do not allow third party classifier.

fabien

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


em driver problem (system lock)

2004-05-10 Thread Fabien THOMAS
 Hi,

We use a lot of intel gigabit card and since the first
time we use it we experience some strange hard lock of the system
(4.9|FreeBSD-stable). We have tried several driver version (it is not
related to a version). We use the card in polling mode but it seems that
the problem can be fired even in interrupt mode.
What i found during the debugging on a fiber card:

1) original driver did not lock but when the other end is rebooted i've
around 10 linkup/linkdown
2) removing linkup/linkdown printf: driver lock each time the other end
system is rebooted!
3) removing the E1000_IMC_RXSEQ in disable_intr correct the lock but i
do not understand why:
 a) E1000_IMC_RXSEQ need to be left when disabling intr?
 b) the system completly lock (even under debugger) for just an
interrupt source enabled?
static void
em_disable_intr(struct adapter *adapter)
{
 E1000_WRITE_REG(&adapter->hw, IMC,
 (0x));/* & ~E1000_IMC_RXSEQ));*/
 return;
}
What do you think of that ?

fabien



smime.p7s
Description: S/MIME Cryptographic Signature