Re: new ARP code review
Luigi Rizzo wrote: On Mon, Jun 18, 2007 at 05:51:44PM +, Qing Li wrote: [luigi:] i agree that the timing is a bit tight for inclusion, especially because the work dates back to 2004 if not before, and i think Qing Li took over development at least two years ago - not a great track record in terms of dedication to the work. I'd rather not see it rushed in :) Not sure how to respond to your comment here ... it wasn't meant as criticism, but just a consideration that there is no point to rush this change in when it has been idle for so long. Stalls occur for many reasons, I (and maybe others) thought you were busy on other stuff, maybe you were waiting for more feedback. But the bottom line is that we are now in a code freeze and this doesn't seem a good time for pushing something in. Add to this that Andre is temporarily on holidays. I hope now people will give you the feedback that you hoped to get a couple of years ago. I emailed to net@ and developers@ for review after I put in the support for IPv6, and made the new functions generic more than two years ago. I received one full review from Gleb and a partial review from Andre. And that patch has been sitting there in my home directory on people.freebsd.org/~qingli ever since. The very last patch I put there is dated April 19, 2005 (for the then -current). This time around, I got two other reviews, and that's it. I'm certainly open for any suggestion on how to get more reviews from the community. And let me know if you have any other specific work items that you want done so you don't feel being rushed. ... the splitting is exactly the goal of this work and is by design. The mapping between the L3 and L2 addresses has nothing to do with the IP route lookup, and it should be elsewhere (namely, in the hash table or whatever data structure is appropriate). Eventually, with this structure you can do the route lookup only when you need to find the next hop (e.g. when a route changes etc.) and just the much-cheaper L3-L2 map in other cases. Even if the current implementation keeps doing both, this change is a step towards a separation of the two functions and enabling more cleanup in the code. I hope you don't disagree on the design. As for actual performance, we may pay something, as we did if you compare 4.x and 6.x/7.x, but then the opportunities for parallelization, reduction of contention and further code simplifications are well worth it. The current code necessary for creating ARP entries through arp_rtrequest(), and the subsequent call paths are convoluted and difficult to understand. The same approach was imported in the ND6 code. This work has eliminated these types of code and the logic flows much better. A couple of people raised the two-lookup performance issue, but "Do you agree in principle ..." is exactly the kind of reviews I was hoping for, but received none so far. This was the gating issue for me for proceeding further two years ago and remains so today. Obviously i totally agree with the principle, and even with the implementation, having discussed the original design with Andre (and implemented it). I think the motivations i gave above are hard to criticize. Certainly, it would be good to put somewhere in the code a few comments (even just the previous paragraphs in this email) describing the design goals (and possibly open issues and/or possible-but-yet-unimplemented optimizations). This should address the concerns on performance that people may have. I might have a few style comments (e.g. putting the small block first in the if/then/else blocks) and also, of course, complete the locking (you mentioned it is incomplete; i see #if 0'ed code, and i did not address locking issues back in 2004 because this code was still under Giant.) gosh it's been a few years since I was in that code, but here goes... I have some thoughts on this. firstly, while it is interesting to have an arp table (ok LLA table) on each interface, I'm not sure that it gains you very much. As mentioned elsewhere, the connection of the arp information with the routing table menas that the arp lookup is virtually free. Or, at least it used to be in the Uniprocessor world. It's hard to beat free. I can imagine however that the situation has changed since locking became a factor. I suppose it depends upon what locking is required in arplookup() to make sure that the route (rt) is not modified while the ll info is being extracted. What are the locking ramifications? The comment "Eventually, with this structure you can do the route lookup only when you need to find the next hop (e.g. when a route changes etc.) and just the much-cheaper L3-L2 map in other cases." makes me wonder..If we are not caching the arp code in the route any more, then how do we avoid doing a route lookup on each packet? I've looked at the patch for a few minutes and I haven't spotted the llad
Re: new ARP code review
Julian Elischer wrote: I have some thoughts on this. firstly, while it is interesting to have an arp table (ok LLA table) on each interface, I'm not sure that it gains you very much. Unfortunately maintaining a single ARP table is insufficient for supporting multiple paths within the IPv4 stack. Even without supporting multiple routing paths, we would still need to break out the ARP cache in this way so as to support being attached to the same layer 2 domain properly (ie two network cards on the same Ethernet segment or switch). At the moment if_bridge and netgraph are our get-out-of-jail-free cards, they cause the IPv4 stack to be bypassed. As mentioned elsewhere, the connection of the arp information with the routing table menas that the arp lookup is virtually free. Or, at least it used to be in the Uniprocessor world. It's hard to beat free. It's hard to beat hard figures, which is something we don't have at the moment. What we do have is a set of design considerations. Intuition would suggest that one lock performs better than two, however, it depends on the nature of the lock and on the nature of the data structure lookup. The comment "Eventually, with this structure you can do the route lookup only when you need to find the next hop (e.g. when a route changes etc.) and just the much-cheaper L3-L2 map in other cases." makes me wonder..If we are not caching the arp code in the route any more, then how do we avoid doing a route lookup on each packet? I don't think you can ever avoid doing a lookup of any kind per packet if you're running a router. What you can do is amortize lookup cost over time, e.g. two expensive initial lookups followed by one cheaper lookup for subsequent packets. Whatever happens, though, has to play nice with policy forwarding and source selection. This is what complicates matters - otherwise I'd just suggest keeping a per-interface hash of ARP entries, an IPv4 routing trie, and a per-destination cache hash which returns the combined lookup against the trie and the L2 hash -- pretty much what Luigi is suggesting. BTW having a per interface arp table does make sense if there a s a particular thread that is responsible for that interface as only it would need access to teh table and it could be done lock-free if one was careful enough. The ARP code has to change, that much is certain, but the locking strategy has yet to be decided. ARP entries are read far more often than they are written, so it seems reasonable that a different lock is used. BMS ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: pf 4.1 Update available for testing
Also send this to the list(s) so people can see that the patches actually work ;-) Max Laier wrote: > On Tuesday 19 June 2007, you wrote: >> Max Laier wrote: >>> On Wednesday 13 June 2007, you wrote: Just as a data point. Will be happy to test altq as soon as it works ;-) >>> Just sent an update to the list - ALTQ should be working now. >> Yes, works fine. No ill effects observed. >> >> This is a "pleas get this into 7.0" from me if that's is still >> possible... > > I'm planning on it, but sending this to the list as well would help, too. > It will also get others to test - I hope. > ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Issue with huge numbers of connections
On Sun, 17 Jun 2007 19:06:16 +0100 Joe Holden <[EMAIL PROTECTED]> wrote: > kern.ipc.nmbclusters FWIW, this one in particular ( controls mbuf clusters) will made a huge difference back in the FBSD 4 days on very heavily used websites. I've had them tuned up to the order of almost 100K - over that they would lock up on boot - the lock ups don't seem to happen anymore on 6, but YMMV. BTW, when the servers I used to run experienced mbuf exhaustion, the machines / OS would still be operational, but nothing would happen at the network layer. A reboot was the only solution I found at the time. P Jeremy made a v. good point about the timeouts of the close states - bring everything down to the minimum that makes sense to your app - the defaults are horribly "kind" to lazy/slow clients :P Service specific configurations may also affect how your resources are used (for example, dont use HTTP keep alives as they hog priceless resources). I know, pretty obvious, but might as well mention it. B _ {Beto|Norberto|Numard} Meijome "But I don't have to know an answer. I don't feel frightened by not knowing things, by being lost in the mysterious universe without having any purpose, which is the way it really is, as far as I can tell, possibly. It doesn't frighten me." Richard Feynman I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Timestamp option and srtt
Hi, I have a question about the RTT/srtt calculation in presence of the timestamp option. If timestamp is nor present, RTT is not calculated for retransmits due to karn's algorithm. However with timestamps even retransmits factor into the RTT calculation. I understand that this is useful in general. Now consider a scenario where some intermediate link/router goes down for 30 secs, then the packet would be retransmitted lets say +1,+2,+4,+8,+16 seconds. Now lets say the 4th transmissions comes back before the timeout expires but it takes it 10 secs (due to the timing of the intermediate coming up). In this case we would end up with a huge srtt value giving the next timeout to be ~30 secs. This will then decrease very slowly with valid acks coming in. This failure was one -time but it still affects throughput quite a bit. Now if this link/router keeps going down for 20-30 secs, every few minutes then the srtt will never really go to normal values, though during the duration that the router is up the rtt is very small. Is this the expected behaviour ? This could cause problems with protocols such as BGP which have a hold-timer and will then reset the connection even for one Keep-alive loss. Could we have an option to turnoff RTT updates for retransmissions even when TS option is turned on? Or have a way to reset the timeout back to initial value instead of it starting from a huge value after a link failure? thanks kapil ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
how do you bring IPv6 live without reboot?
on a 6-STABLE host, I added: ipv6_enable="YES" ipv6_network_interfaces="bge1" to rc.conf, and ran /etc/rc.d/network_ipv6 this did not bring IPv6 live. rtsol reported problems with get_llflag() calls. However across reboot, the system came up with IPv6 fine. Can somebody explain why this won't work if run after the init sequence has run to completion? What is the sequence of commands that when run on an active FreeBSD system causes it to successfully bind to IPv6? -George ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: how do you bring IPv6 live without reboot?
If memory serves me right, George Michaelson wrote: > on a 6-STABLE host, I added: > > ipv6_enable="YES" > ipv6_network_interfaces="bge1" > > to rc.conf, and ran /etc/rc.d/network_ipv6 > > this did not bring IPv6 live. rtsol reported problems with get_llflag() > calls. However across reboot, the system came up with IPv6 fine. > > Can somebody explain why this won't work if run after the init sequence > has run to completion? What is the sequence of commands that when run > on an active FreeBSD system causes it to successfully bind to IPv6? Hrm. You might also need to invoke /etc/rc.d/auto_linklocal before (I think it's before?) network_ipv6. This script was recently added (during the 6.2 release cycle) as a part of mitigating some security risks related to IPv6 link-local addresses. Bruce. signature.asc Description: OpenPGP digital signature
soekris/sis tx checksum problems
Hi, I am using CURRENT on a Soekris 4801 (sis ethernet). With a recent kernel all TCP packets sent via sis0 have a bad checksum. Other systems using other interface types (though I don't have a broad selection to test) don't seem to suffer from this problem. There was a thread in freebsd-current describing the same/similar problem, but there were few complaints (i think they were other brands of IF) and no resolution to it, so I'm bringing it up here. dave c ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"