Re: new ARP code review

2007-06-19 Thread Julian Elischer

Luigi Rizzo wrote:

On Mon, Jun 18, 2007 at 05:51:44PM +, Qing Li wrote:

[luigi:]
i agree that the timing is a bit tight for inclusion, especially 
because the work dates back to 2004 if not before, and i think Qing 
Li took over development at least two years ago - not a great track 
record in terms of dedication to the work. I'd rather not see it 
rushed in :)



   Not sure how to respond to your comment here ...


it wasn't meant as criticism, but just a consideration that there is no
point to rush this change in when it has been idle for so long.
Stalls occur for many reasons, I (and maybe others) thought you
were busy on other stuff, maybe you were waiting for more feedback.
But the bottom line is that we are now in a code freeze and this doesn't
seem a good time for pushing something in. Add to this that Andre is
temporarily on holidays.
I hope now people will give you the feedback that you
hoped to get a couple of years ago.


   I emailed to net@ and developers@ for review after I put in the support
   for IPv6, and made the new functions generic more than two years ago. I
   received one full review from Gleb and a partial review from Andre. And 
   that patch has been sitting there in my home directory on
   people.freebsd.org/~qingli ever since. The very last patch I put there is 
   dated April 19, 2005 (for the then -current). This time around, I got two
   other reviews, and that's it. 


   I'm certainly open for any suggestion on how to get more reviews
   from the community. And let me know if you have any other specific 
   work items that you want done so you don't feel being rushed.

...

the splitting is exactly the goal of this work and is by design.
The mapping between the L3 and L2 addresses has nothing to do with
the IP route lookup, and it should be elsewhere (namely, in the hash
table or whatever data structure is appropriate).

Eventually, with this structure you can do the route lookup
only when you need to find the next hop (e.g. when a route
changes etc.) and just the much-cheaper L3-L2 map in other cases.

Even if the current implementation keeps doing both, this change
is a step towards a separation of the two functions and enabling
more cleanup in the code.

I hope you don't disagree on the design. As for actual performance,
we may pay something, as we did if you compare 4.x and 6.x/7.x,
but then the opportunities for parallelization, reduction of
contention and further code simplifications are well worth it.

   The current code necessary for creating ARP entries through 
   arp_rtrequest(), and the subsequent call paths are convoluted and 
   difficult to understand. The same approach was imported in the ND6 code.

   This work has eliminated these types of code and the logic flows much
   better. 

   A couple of people raised the two-lookup performance issue, but 
   "Do you agree in principle ..." is exactly the kind of reviews I was
   hoping for, but received none so far. This was the gating issue 
   for me for proceeding further two years ago and remains so today.


Obviously i totally agree with the principle, and even with the
implementation, having discussed the original
design with Andre (and implemented it). I think the motivations i gave
above are hard to criticize.
Certainly, it would be good to put somewhere in the code a few
comments (even just the previous paragraphs in this email)
describing the design goals (and possibly open issues
and/or possible-but-yet-unimplemented optimizations).
This should address the concerns on performance that people may have.

I might have a few style comments (e.g. putting the small block
first in the if/then/else blocks) and also, of course, complete
the locking (you mentioned it is incomplete; i see #if 0'ed code,
and i did not address locking issues back in 2004 because this code
was still under Giant.)

gosh it's been a few years since I was in that code, but here goes...

I have some thoughts on this.
firstly, while it is interesting to have an arp table (ok LLA table) 
on each interface, I'm not sure that it gains you very much.


As mentioned elsewhere, the connection of the arp information with the 
routing table menas that the arp lookup is virtually free.

Or, at least it used to be in the Uniprocessor world. It's hard to beat free.

I can imagine however that the situation has changed since locking became
a factor. I suppose it depends upon what locking is required in arplookup() to 
make sure that the route (rt) is not modified while the  ll info is being 
extracted.
What are the locking ramifications? 

The comment 
"Eventually, with this structure you can do the route lookup

only when you need to find the next hop (e.g. when a route
changes etc.) and just the much-cheaper L3-L2 map in other cases."
makes me wonder..If we are not caching the arp code in the route any more,
then how do we avoid doing a route lookup on each packet?
I've looked at the patch for a few minutes and I haven't spotted the llad

Re: new ARP code review

2007-06-19 Thread Bruce M. Simpson

Julian Elischer wrote:


I have some thoughts on this.
firstly, while it is interesting to have an arp table (ok LLA table) 
on each interface, I'm not sure that it gains you very much.


Unfortunately maintaining a single ARP table is insufficient for 
supporting multiple paths within the IPv4 stack. Even without supporting 
multiple routing paths, we would still need to break out the ARP cache 
in this way so as to support being attached to the same layer 2 domain 
properly (ie two network cards on the same Ethernet segment or switch). 
At the moment if_bridge and netgraph are our get-out-of-jail-free cards, 
they cause the IPv4 stack to be bypassed.


As mentioned elsewhere, the connection of the arp information with the 
routing table menas that the arp lookup is virtually free.
Or, at least it used to be in the Uniprocessor world. It's hard to 
beat free.


It's hard to beat hard figures, which is something we don't have at the 
moment.


What we do have is a set of design considerations. Intuition would 
suggest that one lock performs better than two, however, it depends on 
the nature of the lock and on the nature of the data structure lookup.




The comment "Eventually, with this structure you can do the route lookup
only when you need to find the next hop (e.g. when a route
changes etc.) and just the much-cheaper L3-L2 map in other cases."
makes me wonder..If we are not caching the arp code in the route any 
more,

then how do we avoid doing a route lookup on each packet?


I don't think you can ever avoid doing a lookup of any kind per packet 
if you're running a router. What you can do is amortize lookup cost over 
time, e.g. two expensive initial lookups followed by one cheaper lookup 
for subsequent packets.


Whatever happens, though, has to play nice with policy forwarding and 
source selection.


This is what complicates matters - otherwise I'd just suggest keeping a 
per-interface hash of ARP entries, an IPv4 routing trie, and a 
per-destination cache hash which returns the combined lookup against the 
trie and the L2 hash -- pretty much what Luigi is suggesting.




BTW having a per interface arp table does make sense if there a s a 
particular

thread that is responsible for that interface as only it would need
access to teh table and it could be done lock-free if one was careful 
enough.


The ARP code has to change, that much is certain, but the locking 
strategy has yet to be decided. ARP entries are read far more often than 
they are written, so it seems reasonable that a different lock is used.


BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: pf 4.1 Update available for testing

2007-06-19 Thread Florian C. Smeets
Also send this to the list(s) so people can see that the patches
actually work ;-)

Max Laier wrote:
> On Tuesday 19 June 2007, you wrote:
>> Max Laier wrote:
>>> On Wednesday 13 June 2007, you wrote:
 Just as a data point. Will be happy to test altq as soon as it works
 ;-)
>>> Just sent an update to the list - ALTQ should be working now.
>> Yes, works fine. No ill effects observed.
>>
>> This is a "pleas get this into 7.0" from me if that's is still
>> possible...
>
> I'm planning on it, but sending this to the list as well would help,
too.
> It will also get others to test - I hope.
>


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Issue with huge numbers of connections

2007-06-19 Thread Norberto Meijome
On Sun, 17 Jun 2007 19:06:16 +0100
Joe Holden <[EMAIL PROTECTED]> wrote:

> kern.ipc.nmbclusters

FWIW, this one in particular ( controls mbuf clusters) will made a huge
difference back in the FBSD 4 days on very heavily used websites. I've had them
tuned up to the order of almost 100K - over that they would lock up on boot -
the lock ups don't seem to happen anymore on 6, but YMMV.
BTW, when the servers I used to run experienced mbuf exhaustion, the machines /
OS would still be operational, but nothing would happen  at the network layer.
A reboot was the only solution I found at the time.

P Jeremy made a v. good point about the timeouts of the close states - bring
everything down to the minimum that makes sense to your app - the defaults are
horribly "kind" to lazy/slow clients :P

Service specific configurations may also affect how your resources are used
(for example, dont use HTTP keep alives as they hog priceless resources). I
know, pretty obvious, but might as well mention it.

B

_
{Beto|Norberto|Numard} Meijome

"But I don't have to know an answer. I don't feel frightened by not knowing
things, by being lost in the mysterious universe without having any purpose,
which is the way it really is, as far as I can tell, possibly. It doesn't
frighten me." Richard Feynman

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Timestamp option and srtt

2007-06-19 Thread kapil jain
Hi, 
  I have a question about the RTT/srtt calculation in presence of the 
timestamp option. 
If timestamp is nor present, RTT is not calculated for retransmits due 
to karn's algorithm. However with timestamps even retransmits factor 
into the RTT calculation. I understand that this is useful in general.
 
Now consider a scenario where some intermediate link/router goes down 
for 30 secs, then the packet would be retransmitted lets say 
+1,+2,+4,+8,+16 seconds. Now lets say the 4th transmissions comes back 
before the timeout expires but it takes it 10 secs (due to the timing 
of the intermediate coming up). In this case we would end up with a 
huge srtt value giving the next timeout to be ~30 secs. This will 
then decrease very slowly with valid acks coming in. This failure was 
one -time but it still affects throughput quite a bit. 
Now if this link/router keeps going down for 20-30 secs, every few 
minutes then the srtt will never really go to normal values, though 
during the  duration that the router is up the rtt is very small. 
Is this the expected behaviour ? This could cause problems with protocols such 
as BGP which have a hold-timer and will then reset the connection even for one 
Keep-alive loss.
   
  Could we have an option to turnoff RTT updates for retransmissions even when 
TS option is turned on?
  Or have a way to reset the timeout back to initial value
  instead of it starting from a huge value after  a link failure? 
 
 
  thanks 
 kapil 
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


how do you bring IPv6 live without reboot?

2007-06-19 Thread George Michaelson

on a 6-STABLE host, I added:

ipv6_enable="YES"  
ipv6_network_interfaces="bge1"

to rc.conf, and ran /etc/rc.d/network_ipv6

this did not bring IPv6 live. rtsol reported problems with get_llflag()
calls. However across reboot, the system came up with IPv6 fine.

Can somebody explain why this won't work if run after the init sequence
has run to completion? What is the sequence of commands that when run
on an active FreeBSD system causes it to successfully bind to IPv6?

-George
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: how do you bring IPv6 live without reboot?

2007-06-19 Thread Bruce A. Mah
If memory serves me right, George Michaelson wrote:
> on a 6-STABLE host, I added:
> 
> ipv6_enable="YES"  
> ipv6_network_interfaces="bge1"
> 
> to rc.conf, and ran /etc/rc.d/network_ipv6
> 
> this did not bring IPv6 live. rtsol reported problems with get_llflag()
> calls. However across reboot, the system came up with IPv6 fine.
> 
> Can somebody explain why this won't work if run after the init sequence
> has run to completion? What is the sequence of commands that when run
> on an active FreeBSD system causes it to successfully bind to IPv6?

Hrm.  You might also need to invoke /etc/rc.d/auto_linklocal before (I
think it's before?) network_ipv6.  This script was recently added
(during the 6.2 release cycle) as a part of mitigating some security
risks related to IPv6 link-local addresses.

Bruce.




signature.asc
Description: OpenPGP digital signature


soekris/sis tx checksum problems

2007-06-19 Thread David Cornejo

Hi,

I am using CURRENT on a Soekris 4801 (sis ethernet).  With a recent 
kernel all TCP packets sent via sis0 have a bad checksum.  Other 
systems using other interface types (though I don't have a broad 
selection to test) don't seem to suffer from this problem.


There was a thread in freebsd-current describing the same/similar 
problem, but there were few complaints (i think they were other 
brands of IF) and no resolution to it, so I'm bringing it up here.


dave c

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"