Generic ioctl and ether_ioctl don't agree
Hi folks, Quite a while ago I noticed that our ioctl handlers get the ioctl command via u_long, but ether_ioctl()'s command argument is int. This disarray dates back to 1998, when ioctl functions started to take u_long as the command, but ether_ioctl() was never fixed. Fortunately, our ioctl command coding still fits in 32 bits, or else we would've got problems on 64-bit arch'es already. I'd like to fix this long-standing bug some day after RELENG_7 is branched. Of course, this will break ABI to network modules on all 64-bit arch'es. BTW, the same applies to other L2 layers, such as firewire, which seems to have been cloned from if_ethersubr.c. Any objections or comments? Thanks! -- Yar ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Who is to load dummynet.ko?
On Tue, Mar 13, 2007 at 12:45:43AM -0700, Luigi Rizzo wrote: > On Sat, Mar 10, 2007 at 06:35:34PM +0300, Yar Tikhiy wrote: > > Hi folks, > > > > Just noticed that neither ipfw(8) nor /etc/rc.d/ipfw cares to load > > dummynet.ko. It can result in a broken setup when one migrates > > from a custom monolithic kernel to GENERIC with modules, which is > > a nice way to reduce support headache today. > > > > There are at least two possible ways to deal with the issue. The > > easy way is to give the task of loading dummynet.ko to /etc/rc.d/ipfw. > > The problem with it is that the script cannot know in advance if > > dummynet is really used by the ipfw rules to be loaded. The decision > > whether to load the module is left to rc.conf(5) in this case. > > i think this is a reasonable option and the one to use for all ipfw > extensions (divert, dummynet, in-kernel nat and so on). Thank you for your reply! Loading dummynet via rc.d is really easy, I think I can add it. However, I'm afraid it won't be a real long-term solution -- please see below. > Making the load on demand would require a bit of additional code because > it depends on the actual rules being loaded, and the rules are not > parsed at load time. Plus, i believe that in a case like this > the decision of which modules to load should be a conscious one > taken upfront by the system administrator (i.e. end up in rc.conf > or loader.conf) rather than be the result of the actual ipfw > configuration. Well, I used to stick to this opinion, too, in the good old days. But today we are growing more and more modularity in our kernel, and it's a nice feature to have. With a lot of modules, the issue of double configuration appears: if I want feature FOO, I have to add its configuration AND not forget to load the respective module. It can be a pain as the number of such cases rockets up. Today at least mount, ifconfig, and netgraph provide for loading modules on demand, with the former two being system's core components. I've just taken a look at the ipfw userland utility code. It notices a "pipe" or "queue" keyword in its command line rather early, and it can be a good moment to check and load dummynet.ko. Ditto for divert. Or the inner function do_cmd() can load a missing module before issuing setsockopt(), which is even better as we won't load the module on a read access or if the command line contains a syntax error. do_cmd() can even load ipfw.ko itself so that people no longer have to type, e.g.: (kldload ipfw && ipfw add 65000 allow ip from any to any) > /dev/null 2>&1 insead of just: ipfw add 65000 allow ip from any to any if a need to load ipfw.ko for some experiments arise. > > The second way is to move the task of loading modules to ipfw(8). > > Then it could load ipfw.ko, divert.ko, and dummynet.ko on demand. > > cheers > luigi -- Yar ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kern/106722: [net] [patch] ifconfig may not connect an interface to known network
Synopsis: [net] [patch] ifconfig may not connect an interface to known network Responsible-Changed-From-To: freebsd-net->glebius Responsible-Changed-By: glebius Responsible-Changed-When: Wed Mar 14 11:18:24 UTC 2007 Responsible-Changed-Why: I'll work on this. http://www.freebsd.org/cgi/query-pr.cgi?pr=106722 ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Who is to load dummynet.ko?
On Wed, Mar 14, 2007 at 12:57:26PM +0300, Yar Tikhiy wrote: > On Tue, Mar 13, 2007 at 12:45:43AM -0700, Luigi Rizzo wrote: ... > > Making the load on demand would require a bit of additional code because > > it depends on the actual rules being loaded, and the rules are not > > parsed at load time. Plus, i believe that in a case like this > > the decision of which modules to load should be a conscious one > > taken upfront by the system administrator (i.e. end up in rc.conf > > or loader.conf) rather than be the result of the actual ipfw > > configuration. > > Well, I used to stick to this opinion, too, in the good old days. > But today we are growing more and more modularity in our kernel, > and it's a nice feature to have. With a lot of modules, the issue > of double configuration appears: if I want feature FOO, I have to yes this is also try. > add its configuration AND not forget to load the respective module. > It can be a pain as the number of such cases rockets up. Today at > least mount, ifconfig, and netgraph provide for loading modules on > demand, with the former two being system's core components. > > I've just taken a look at the ipfw userland utility code. It notices > a "pipe" or "queue" keyword in its command line rather early, and > it can be a good moment to check and load dummynet.ko. Ditto for actually, i think it is the kernel itself (in the setsockopt handler, once it validates the rule) that should load the module, and not leave the task to the userland utility. Other modules already do this, e.g. iwi loads the firmware autonomously, and maybe even netgraph components do something similar. For dummynet and divert, this can be surely put in the setsockopt handler which is in ipfw.ko - if you need to autoload ipfw.ko, then i am not sure where to put the hooks (in the kernel) but i am pretty confident that there must be a good place. cheers luigi ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kern/106722: [net] [patch] ifconfig may not connect an interface to known network
Just for the reference, here is a backtrace that shows how EEXIST is returned: rtrequest1(1,e6560aec,e6560ae0,e6560aec,30,...) at rtrequest1+0x658^M rtinit(c3e21500,1,1) at rtinit+0x193^M in_addprefix(c3e21500,1,e6560b68,0,1,...) at in_addprefix+0xa1^M in_ifinit(c3c4ec00,c3e21500,c3eb6e50,0) at in_ifinit+0x761^M in_control(c3f37bac,8040691a,c3eb6e40,c3c4ec00,c3e9b740) at in_control+0x93e^M ifioctl(c3f37bac,8040691a,c3eb6e40,c3e9b740,0,...) at ifioctl+0x1cf^M soo_ioctl(c3e5a828,8040691a,c3eb6e40,c414e000,c3e9b740) at soo_ioctl+0x2db^M kern_ioctl(c3e9b740,3,8040691a,c3eb6e40) at kern_ioctl+0x296^M ioctl(c3e9b740,e6560d00) at ioctl+0xf1^M syscall(e6560d38) at syscall+0x242^M Xint0x80_syscall() at Xint0x80_syscall+0x20^M The patch proposed vy Vladimir really looks like a hack. It covers only a case when old route was a gateway one. So, even with patch the following won't work: route add 10.0.0.0/24 -iface lo0 ifconfig IFACE 10.0.0.1/24 alias Also, I am afraid of the side effects, when patched kernel will substitute route in a case when it should return error. AFAIK, the problem needs a more generic approach. I see two approaches. 1) Introduce RTM_CHANGEADD, a command that will forcibly add route, deleting all conflicting ones. Use this command in in_addprefix(). 2) In rt_flags field we still have several extra bits. We can use them to specify route source - RTS_CONNECTED, RTS_STATIC, RTS_XXX, where XXX is a routing protocol. When issuing RTM_ADD a route with a preferred source (e.g. CONNECTED vs STATIC) will override the old one. freebsd-net subscibers, what do you think? -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: tap(4) should go UP if opened
Bruce M. Simpson <[EMAIL PROTECTED]> wrote on 9 Mar 2007 12:30: > However, we also support the creation of tap/tun instances by > non-super-users, so there is motivation for the change. Configuring a > tap interface to up by a non-superuser should only be permitted if the > interface itself was created by a non-superuser, and if > net.link.tap.user_open is set to 1. > > A more involved patch is needed to do this right for all cases -- we > should not do this by default. After thinking about the problem I agree with you and propose the following patch: --- sys/net/if_tap.c.orig Thu Mar 8 19:10:59 2007 +++ sys/net/if_tap.cWed Mar 14 12:35:58 2007 @@ -501,6 +501,8 @@ s = splimp(); ifp->if_drv_flags |= IFF_DRV_RUNNING; ifp->if_drv_flags &= ~IFF_DRV_OACTIVE; + if (tapuopen) + ifp->if_flags |= IFF_UP; splx(s); TAPDEBUG("%s is open. minor = %#x\n", ifp->if_xname, minor(dev)); Rationale: For transmitting packets via tap(4) device (at least) two conditions have to met: 1. The control device must be opened by an process. 2. The ethernet interface must be UP. For 1. we allow non-root processes the access, when a) sysctl net.link.tap.user_open=1 AND b) /dev/tapx has sufficient permissions If we have no possibility to mark the interface as UP for the non-root process the net.link.tap.user_open=1 is useless, because we can not transmit any packets. With the patch the interface goes UP only, when the administrator allowed non-root user access. Regards, Frank -- Frank Behrens, Osterwieck, Germany PGP-key 0x5B7C47ED on public servers available. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: netisr_direct
On 3/11/07, Robert Watson <[EMAIL PROTECTED]> wrote: Yes -- right now the in-bound TCP path is essentially serialized because of the tcbinfo lock. The reason for this is that the tcbinfo lock doesn't just protect the inpcb chains during lookup, but also effectively acts as a reference to prevent the inpcb from being freed during input processing. There are several ways we could start to reduce contention on that lock: So, why is the tcbinfo lock being used to protect the pcb from deletion? Why isn't the INP_LOCK on the pcb used, instead? Keith -- Well, I didn't find the Holy Grail, but I did find a rusty cup without too many holes in it... -- Jeff Semke ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Generic ioctl and ether_ioctl don't agree
Yar Tikhiy wrote: Hi folks, Quite a while ago I noticed that our ioctl handlers get the ioctl command via u_long, but ether_ioctl()'s command argument is int. This disarray dates back to 1998, when ioctl functions started to take u_long as the command, but ether_ioctl() was never fixed. Fortunately, our ioctl command coding still fits in 32 bits, or else we would've got problems on 64-bit arch'es already. I'd like to fix this long-standing bug some day after RELENG_7 is branched. Of course, this will break ABI to network modules on all 64-bit arch'es. BTW, the same applies to other L2 layers, such as firewire, which seems to have been cloned from if_ethersubr.c. This is one of those annoying things which breaks compatibility with external modules. I'm not sure about this, though. I was getting sign extension warnings on amd64 last week when I was testing the IGMPv3 aware mtest(8). Perhaps if we're fixing these ABIs, we should commit to an explicit C99 type with known bit width, i.e. uint32_t. I would be much happier if we began using C99 types in the code. Just my 2c. BMS ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: netisr_direct
On Wed, 14 Mar 2007, Keith Arner wrote: On 3/11/07, Robert Watson <[EMAIL PROTECTED]> wrote: Yes -- right now the in-bound TCP path is essentially serialized because of the tcbinfo lock. The reason for this is that the tcbinfo lock doesn't just protect the inpcb chains during lookup, but also effectively acts as a reference to prevent the inpcb from being freed during input processing. There are several ways we could start to reduce contention on that lock: So, why is the tcbinfo lock being used to protect the pcb from deletion? Why isn't the INP_LOCK on the pcb used, instead? The reasoning here is a little complex, and has to do with combining two uses of the tcbinfo lock. The tcbinfo lock is before the inpcb lock in the lock order, as you need to access the tcbinfo lists in order to acquire a reference to the inpcb. tcp_input() will always acquire a tcbinfo lock (whether one as today, or one of several in the future) in order to look up the inpcb. tcp_input() will then also acquire an inpcb lock to protect individual connection state. There are then two cases: simple cases, where we know we don't need to access the lists again, and then complex cases where we may need to access the list. A typical example of the former is a straight ACK in the fast path, which will modify per-connection state only, and a typical example of the latter is a RST where we will tear down connection, which may remove the inpcb from the global lists. In the former case, we do release the tcbinfo lock (in most cases) once we have decided that we won't need it; in the latter case we hold it because re-acquiring the lock would require dropping the inpcb lock for lock order reasons should the connection close. This is where moving to a reference count would help us: it would allow releasing both locks while maintaining a valid pointer to the inpcb, in turn letting us drop the tcbinfo lock and then re-acquire it later if we do hit a connection close case. This could use some refinement, and there are probably more cases we could be dropping the tcbinfo lock. BTW, in 7.x there is significantly less contention on the pcbinfo lock because it's no longer acquired in any of the common send and receive paths in TCP, whereas previously it was. This significantly lowers contention between the upper/lower halves of the kernel: that is, between a user thread performing send or receive on a TCP socket and netisr processing. In 6.x, the pcbinfo lock is used more extensively in order to prevent the inpcb from being freed. The change I've made in 7.x is to guarantee that so_pcb will always be valid for a properly referenced socket, keeping the inpcb around until the socket is freed in the case of a reset, rather than leaving the socket without the inpcb (and hence requiring a lock to keep so_pcb valid). Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: tap(4) should go UP if opened
Hi, Frank Behrens wrote: If we have no possibility to mark the interface as UP for the non-root process the net.link.tap.user_open=1 is useless, because we can not transmit any packets. With the patch the interface goes UP only, when the administrator allowed non-root user access. The conditional in the second patch is a no-op as the open will be forbidden if the user did not have privilege to open the tap. Bringing the interface up by default potentially violates POLA, so this should not happen by default. Please try the attached patch, which puts this behaviour under a sysctl. Thanks, BMS //depot/user/bms/netdev/sys/net/if_tap.c#1 - /home/bms/p4/netdev/sys/net/if_tap.c --- /tmp/tmp.58336.0 Wed Mar 14 13:06:09 2007 +++ /home/bms/p4/netdev/sys/net/if_tap.c Wed Mar 14 13:05:54 2007 @@ -150,7 +150,8 @@ */ static struct mtx tapmtx; static int tapdebug = 0;/* debug flag */ -static int tapuopen = 0;/* allow user open() */ +static int tapuopen = 0;/* allow user open() */ +static int tapuponopen = 0;/* IFF_UP on open() */ static int tapdclone = 1; /* enable devfs cloning */ static SLIST_HEAD(, tap_softc) taphead; /* first device */ static struct clonedevs *tapclones; @@ -164,6 +165,8 @@ "Ethernet tunnel software network interface"); SYSCTL_INT(_net_link_tap, OID_AUTO, user_open, CTLFLAG_RW, &tapuopen, 0, "Allow user to open /dev/tap (based on node permissions)"); +SYSCTL_INT(_net_link_tap, OID_AUTO, up_on_open, CTLFLAG_RW, &tapuponopen, 0, + "Bring interface up when /dev/tap is opened"); SYSCTL_INT(_net_link_tap, OID_AUTO, devfs_cloning, CTLFLAG_RW, &tapdclone, 0, "Enably legacy devfs interface creation"); SYSCTL_INT(_net_link_tap, OID_AUTO, debug, CTLFLAG_RW, &tapdebug, 0, ""); @@ -502,6 +505,8 @@ s = splimp(); ifp->if_drv_flags |= IFF_DRV_RUNNING; ifp->if_drv_flags &= ~IFF_DRV_OACTIVE; + if (tapuponopen) + ifp->if_flags |= IFF_UP; splx(s); TAPDEBUG("%s is open. minor = %#x\n", ifp->if_xname, minor(dev)); ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: [PATCH] Removal of redundant entries from ifnet manpage
Aniruddha Bohra wrote: Hi, The ifnet manpage contains entries for the following routines which do not exist in the ifnet struct. committed, thanks! ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: tap(4) should go UP if opened
Bruce, many thanks for your fast response. Bruce M. Simpson <[EMAIL PROTECTED]> wrote on 14 Mar 2007 13:09: > The conditional in the second patch is a no-op as the open will be > forbidden if the user did not have privilege to open the tap. Bringing No. A process running with root rights can always open the tap. > the interface up by default potentially violates POLA, so this should > not happen by default. Ok, I see that the behaviour changes. I wonder who used the "tap user open" sysctl, when additional root rights are necessary to bring the interface UP? I can't imagine a setup where this could be used, somebody else? > Please try the attached patch, which puts this behaviour under a sysctl. Fine! This should work without problems. I agree with this solution, sounds good. I'll test it and report the result. Regards and thanks for your support, Frank -- Frank Behrens, Osterwieck, Germany PGP-key 0x5B7C47ED on public servers available. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Who is to load dummynet.ko?
On Wed, Mar 14, 2007 at 04:35:06AM -0700, Luigi Rizzo wrote: > On Wed, Mar 14, 2007 at 12:57:26PM +0300, Yar Tikhiy wrote: > > On Tue, Mar 13, 2007 at 12:45:43AM -0700, Luigi Rizzo wrote: > ... > > > Making the load on demand would require a bit of additional code because > > > it depends on the actual rules being loaded, and the rules are not > > > parsed at load time. Plus, i believe that in a case like this > > > the decision of which modules to load should be a conscious one > > > taken upfront by the system administrator (i.e. end up in rc.conf > > > or loader.conf) rather than be the result of the actual ipfw > > > configuration. > > > > Well, I used to stick to this opinion, too, in the good old days. > > But today we are growing more and more modularity in our kernel, > > and it's a nice feature to have. With a lot of modules, the issue > > of double configuration appears: if I want feature FOO, I have to > > yes this is also try. > > > add its configuration AND not forget to load the respective module. > > It can be a pain as the number of such cases rockets up. Today at > > least mount, ifconfig, and netgraph provide for loading modules on > > demand, with the former two being system's core components. > > > > I've just taken a look at the ipfw userland utility code. It notices > > a "pipe" or "queue" keyword in its command line rather early, and > > it can be a good moment to check and load dummynet.ko. Ditto for > > actually, i think it is the kernel itself (in the setsockopt handler, > once it validates the rule) that should load the module, and not leave > the task to the userland utility. Other modules already do this, > e.g. iwi loads the firmware autonomously, and maybe even netgraph components > do something similar. > > For dummynet and divert, this can be surely put in the setsockopt > handler which is in ipfw.ko - if you need to autoload ipfw.ko, > then i am not sure where to put the hooks (in the kernel) but i am > pretty confident that there must be a good place. As our fortune file puts it, "If God is dead, who will save the Queen?" :-) We seem to have a sort of a chicken and egg problem here. Perhaps that's why most auto-loading is done from the userland. Of course, putting it in the kernel is better because it allows for different control tools unconcerned with the modules (imagining xipfw(8) here), but it has this shortcoming. Another funny point is that now it's impossible to unload a module auto-loaded by the kernel itself. I'm uncertain if it's an architectural limitation or "just in case". -- Yar ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Who is to load dummynet.ko?
On Wed, Mar 14, 2007 at 05:11:43PM +0300, Yar Tikhiy wrote: > On Wed, Mar 14, 2007 at 04:35:06AM -0700, Luigi Rizzo wrote: ... > > actually, i think it is the kernel itself (in the setsockopt handler, > > once it validates the rule) that should load the module, and not leave > > the task to the userland utility. Other modules already do this, > > e.g. iwi loads the firmware autonomously, and maybe even netgraph components > > do something similar. > > > > For dummynet and divert, this can be surely put in the setsockopt > > handler which is in ipfw.ko - if you need to autoload ipfw.ko, > > then i am not sure where to put the hooks (in the kernel) but i am > > pretty confident that there must be a good place. > > As our fortune file puts it, "If God is dead, who will save the > Queen?" :-) We seem to have a sort of a chicken and egg problem not really. IP_FW_GET (and other commands) are processed in sys/netinet/raw_ip.c::rip_ctloutput() so it's there that we can try and autoload the module (if ip_fw_ctl_ptr == NULL). I don't know if there are hooks to autoload a protocol stack, as some are not modules - no ipv4.ko, ipv6.ko, but there is arcnet.ko, but you could in principle do the same thing with protocols and anywhere there is a missing function, annotate it with the functions to autoload the module supplying it. cheers luigi ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Generic ioctl and ether_ioctl don't agree
On Wed, Mar 14, 2007 at 01:20:23PM +0300, Yar Tikhiy wrote: > Hi folks, > > Quite a while ago I noticed that our ioctl handlers get the ioctl > command via u_long, but ether_ioctl()'s command argument is int. > This disarray dates back to 1998, when ioctl functions started to > take u_long as the command, but ether_ioctl() was never fixed. > Fortunately, our ioctl command coding still fits in 32 bits, or > else we would've got problems on 64-bit arch'es already. I'd like > to fix this long-standing bug some day after RELENG_7 is branched. > Of course, this will break ABI to network modules on all 64-bit > arch'es. BTW, the same applies to other L2 layers, such as firewire, > which seems to have been cloned from if_ethersubr.c. > > Any objections or comments? Thanks! Why wait? We're allowed to break module ABIs in current at any time and there's no chance modules built on RELENG_6 will work on RELENG_7 trees anyway. -- Brooks pgpBjF7OcidMq.pgp Description: PGP signature
Re: IPv6 over gif(4) broken in 6.2-RELEASE?
Hi Tatuya, Well after getting distracted for a while, I am back with this one. On Fri, Jan 26, 2007 at 03:13:07AM +0900, JINMEI Tatuya / [EMAIL PROTECTED]@C#:H wrote: > > On Sun, 21 Jan 2007 09:32:44 +0200, > > John Hay <[EMAIL PROTECTED]> said: > > >> There's another workaround for people stuck in this situation and who > >> aren't in a position to try this diff. That is to manually install > >> the host route like this: > >> > >> # route add -host -inet6 :::::2 -interface gif0 -nostatic > >> -llinfo > >> > >> Comments? > > > Well it seems that even my stuff does not always work perfectly with that > > change (1.48.2.15), so maybe we should revert it and I will search for > > yet other ways to make FreeBSD's IPv6 code to actually work for our stuff. > > > My "stuff" is a wireless IPv6 only network running in adhoc mode with > > olsrd as the routing protocol. The problem is that all nodes on a subnet > > cannot "see" each other, so olsrd needs to add routes to a node through > > another node. Sometimes, just to complicate matters a little more, you > > would want to have more than one network card in a host, all with the same > > subnet address. (For instance on a high site, with sector antennas.) > > > The case that I found that still does not work reliably, is if olsrd add > > the route and route is not immediately used, then the nd code will time > > it out and remove it. > > I think I'm responsible for the troubles. I've been thinking about > how to meet all the requests, and concluded that it's more complicated > than I originally thought. > > I've come across an idea that may solve the problems, but I'll need > more time to implement and test it. > > At the moment, I suggest reverting the 1.48.2.16 change for those who > simply wanted a gif to work. > > Regarding the OLSRD stuff, I'd like to know more specific features > that are sought. For example, > > - what should happen if link-layer address resolution fails? Should > then entry be removed? Probably not according to your description > above, but what would you expect the entry to become in this case? > > - once the link-layer address is resolved for the entry, should it be > regarded as "permanent" without any ND state changes? For example, > should NUD be performed on the cache? If yes, what should happen if > NUD detects the neighbor is unreachable? Should the entry be > removed? Again, probably not, but then what should it become? > Keeping it with the same link-layer address? Keeping it with an > empty link-layer address? Others? What if the neighbor is acting > as a router (setting the router flag in NAs)? Should destination > caches using the now-unreachable router be removed as described in > the protocol spec? Or should the destination caches be intact? > > I have my own speculation on these points, but I'd like to know what > the actual user(s) of these features want before taking any action > based on the speculation. Maybe some background. Olsrd (http://www.olsr.org/) does not use link-local addresses. I think it might have made thinks simpler... Except if you kill it in a weird way, it should remove routes that it have added, so I guess we don't really need a timeout. I can think of 3 types of routes that olsr use: 1) A direct route. In a single interface, you actually would not need these because the standard FreeBSD/Kame IPv6 code would handle it. The problem come in when you have more than one interface (and in the same subnet). I think it would be great if I can just tell the kernel which interface to use and let it do all the normal IPv6 stuff to make comms work to that host, but do it on the specified interface. If it then timeout the low-level stuff because of comms problems, that is fine, as long as it remember which interface to use when it has to try again. Let me try a picture: An olsr router may have more than one wireless interface to cover different areas. In this example ath0 and ath1 are configured with the same IPv6 subnet, eg. fd12:3:4:5::/48 - | | |D | | | |ath0 ath1| - )-| |-( ) ( ) ( ABC Then the client might be somewhere between A and B and sometimes work through ath0 and sometimes through ath1. Olsrd must be able to tell the kernel which interface to use. And it must keep on using that interface until olsrd delete that route and add another one. 2) A host route through another host. In my above picture, C might be too far to reach D directly, so it will need to add a route through B to
Re: kern/106722: [net] [patch] ifconfig may not connect an interface to known network
Gleb Smirnoff wrote: AFAIK, the problem needs a more generic approach. I see two approaches. 1) Introduce RTM_CHANGEADD, a command that will forcibly add route, deleting all conflicting ones. Use this command in in_addprefix(). 2) In rt_flags field we still have several extra bits. We can use them to specify route source - RTS_CONNECTED, RTS_STATIC, RTS_XXX, where XXX is a routing protocol. When issuing RTM_ADD a route with a preferred source (e.g. CONNECTED vs STATIC) will override the old one. The proposed changes also constitute a hack. I understand that they are being proposed to address problems we currently have in the stack, i.e. that we do not support multipathing, though it is more than likely they will be blown away in future when the architecture changes (and it has to change). Approach 1 is largely irrelevant if multiple paths are introduced to the network stack; there is then no concept of a conflicting forwarding entry, only preference derived from the interface, entry flags, or the entry ('route') itself. Approach 2 has some merit to it, although the forwarding plane should not care where the forwarding entry came from unless it needs to (e.g. next-hop resolution). It seems reasonable that the forwarding plane should tag entries as being 'CONNECTED' i.e. derived from the address configuration of an interface. I believe many implementations out there do this, and multi-path does not change this. We already have the RTF_PROTO1 flag to determine if the forwarding entry ('route') came from a routing protocol in userland, so there should be no need to change the existing flags. The RTF_STATIC flag only has special meaning in that it means 'the user added this forwarding entry manually via the route(8) command'. We should preserve these semantics, though I believe we should start implementing forwarding preference in the radix trie. I think it seems acceptable and reasonable that we use a limited form of Approach 2 to clobber 'routes' being aded in the case described in the PR, until such time as the network stack is re-engineered to support multiple paths and forwarding preference. I also believe it is useful if we start to use more modern technical jargon to discuss 'routes' in the network stack, because we are actually discussing the behaviour of entries in a forwarding table. Regards, BMS ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kern/106722: [net] [patch] ifconfig may not connect an interface to known network
On Wed, Mar 14, 2007 at 04:00:13PM +, Bruce M. Simpson wrote: B> The proposed changes also constitute a hack. B> B> I understand that they are being proposed to address problems we B> currently have in the stack, i.e. that we do not support multipathing, B> though it is more than likely they will be blown away in future when the B> architecture changes (and it has to change). B> B> Approach 1 is largely irrelevant if multiple paths are introduced to the B> network stack; there is then no concept of a conflicting forwarding B> entry, only preference derived from the interface, entry flags, or the B> entry ('route') itself. B> B> Approach 2 has some merit to it, although the forwarding plane should B> not care where the forwarding entry came from unless it needs to (e.g. B> next-hop resolution). B> B> It seems reasonable that the forwarding plane should tag entries as B> being 'CONNECTED' i.e. derived from the address configuration of an B> interface. I believe many implementations out there do this, and B> multi-path does not change this. B> B> We already have the RTF_PROTO1 flag to determine if the forwarding entry B> ('route') came from a routing protocol in userland, so there should be B> no need to change the existing flags. B> B> The RTF_STATIC flag only has special meaning in that it means 'the user B> added this forwarding entry manually via the route(8) command'. We B> should preserve these semantics, though I believe we should start B> implementing forwarding preference in the radix trie. B> B> I think it seems acceptable and reasonable that we use a limited form of B> Approach 2 to clobber 'routes' being aded in the case described in the B> PR, until such time as the network stack is re-engineered to support B> multiple paths and forwarding preference. B> B> I also believe it is useful if we start to use more modern technical B> jargon to discuss 'routes' in the network stack, because we are actually B> discussing the behaviour of entries in a forwarding table. I was afraid that this would raise an argument on multipath routing. Let's temporary do not speak about multipath but just decide what is the correct way to remove conflicting routes when we are assigning an IP prefix to a local interface? -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kern/106722: [net] [patch] ifconfig may not connect an interface to known network
Gleb Smirnoff wrote: I was afraid that this would raise an argument on multipath routing. Let's temporary do not speak about multipath but just decide what is the correct way to remove conflicting routes when we are assigning an IP prefix to a local interface? My suggestion is to take the second approach you outlined but modify it slightly. That way, the conflict between the 'connected' FTE introduced by ifconfig'ing the interface and the pre-existing FTE for that network prefix, may be resolved in a manner which doesn't break current consumers of the routing code, and leaves the way open to do multipath later w/o problems. Regards, BMS ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re[2]: kern/106722: [net] [patch] ifconfig may not connect an interface to known network
Wednesday, March 14, 2007, 7:00:13 PM, Bruce M. Simpson wrote: BMS> Gleb Smirnoff wrote: >> AFAIK, the problem needs a more generic approach. I see two approaches. >> >> 1) Introduce RTM_CHANGEADD, a command that will forcibly add route, >> deleting all conflicting ones. Use this command in in_addprefix(). >> >> 2) In rt_flags field we still have several extra bits. We can use >> them to specify route source - RTS_CONNECTED, RTS_STATIC, RTS_XXX, >> where XXX is a routing protocol. When issuing RTM_ADD a route with >> a preferred source (e.g. CONNECTED vs STATIC) will override the old >> one. >> BMS> I understand that they are being proposed to address problems we BMS> currently have in the stack, i.e. that we do not support multipathing, BMS> though it is more than likely they will be blown away in future when the BMS> architecture changes (and it has to change). IMHO question is not related to multipathing. Kernel routes now don't contain administrative distance and it root of this problem. RTS_CONNECTED, RTS_STATIC is a hack adding some fixed AD values without increasing route size in memory. -- WBR, Anton Yuzhaninov ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
fbsd amd64 and fast_ipsec
OK, I have used these ken mods for my file server/nat/router/firewall servers for years. (kern ops then question) #Mt e options IPFIREWALL options IPDIVERT options IPFIREWALL_VERBOSE options IPSTEALTH options FAST_IPSEC #smbfs stuff options NETSMB options NETSMBCRYPTO options LIBMCHAIN options LIBICONV options SMBFS #end smbfs stuff device crypto With 6.2, with latest (3.13.07) cvsup -L 2 -h `(fastest_cvsup -q -c us )` /root/stable-supfile make buildworld etc...I STILL cannot get setkey nor racoon to function. I keep getting a pfkey error, and cannot establish a VPN tunnel. I can if I use: options IPSEC options IPSEC_ESP options IPSEC_DEBUG But giant mutex is enable...etc. However all was good up to 6.1. Is this broken in 6.2?? -- View this message in context: http://www.nabble.com/fbsd-amd64-and-fast_ipsec-tf3405028.html#a9484186 Sent from the freebsd-net mailing list archive at Nabble.com. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Scalability problem from route refcounting
I have recently started looking at database performance over gigabit ethernet, and there seems to be a bottleneck coming from the way route reference counting is implemented. On an 8-core system it looks like we spend a lot of time waiting for the rtentry mutex: maxtotal wait_total count avg wait_avg cnt_hold cnt_lock name [...] 408 950496 1135994 301418 3 324876 55936 net/if_ethersubr.c:397 (sleep mutex:bge1) 974 968617 1515169 253772 3 514741 60581 dev/bge/if_bge.c:2949 (sleep mutex:bge1) 2415 18255976 1607511 25384171 6 125174 3131 netinet/tcp_input.c:770 (sleep mutex:inp) 233 1850252 2080506 14181713140 126897 netinet/tcp_usrreq.c:756 (sleep mutex:inp) 384 6895050 2737492 29900223 992100 73942 dev/bge/if_bge.c:3506 (sleep mutex:bge1) 626 5342286 2760193 30147717 947616 54158 net/route.c:147 (sleep mutex:radix node head) 326 3562050 3381510 3014771111 133968 110104 net/route.c:197 (sleep mutex:rtentry) 146 947173 5173813 301477 31744578 120961 net/route.c:1290 (sleep mutex:rtentry) 146 953718 5501119 301476 31863285 121819 netinet/ip_output.c:610 (sleep mutex:rtentry) 50 4530645 7885304 1423098 3 5 642391 788230 kern/subr_turnstile.c:489 (spin mutex:turnstile chain) i.e. during a 30 second sample we spend a total of >14 seconds (on all cpus) waiting to acquire the rtentry mutex. This appears to be because (among other things), we increment and then decrement the route refcount for each packet we send, each of which requires acquiring the rtentry mutex for that route before adjusting the refcount. So multiplexing traffic for lots of connections over a single route is being partly rate-limited by those mutex operations. This is not the end of the story though, the bge driver is a serious bottleneck on its own (e.g. I nulled out the route locking since it is not relevant in my environment, at least for the purposes of this test, and that exposed bge as the next problem -- but other drivers may not be so bad). Kris pgphfb5isF86R.pgp Description: PGP signature
Re: Scalability problem from route refcounting
Apologies in advance if you have already answered this question elsewhere - can you point me to a HOWTO for replicating the test in my local environment? -Kip On 3/14/07, Kris Kennaway <[EMAIL PROTECTED]> wrote: I have recently started looking at database performance over gigabit ethernet, and there seems to be a bottleneck coming from the way route reference counting is implemented. On an 8-core system it looks like we spend a lot of time waiting for the rtentry mutex: maxtotal wait_total count avg wait_avg cnt_hold cnt_lock name [...] 408 950496 1135994 301418 3 324876 55936 net/if_ethersubr.c:397 (sleep mutex:bge1) 974 968617 1515169 253772 3 514741 60581 dev/bge/if_bge.c:2949 (sleep mutex:bge1) 2415 18255976 1607511 25384171 6 125174 3131 netinet/tcp_input.c:770 (sleep mutex:inp) 233 1850252 2080506 14181713140 126897 netinet/tcp_usrreq.c:756 (sleep mutex:inp) 384 6895050 2737492 29900223 992100 73942 dev/bge/if_bge.c:3506 (sleep mutex:bge1) 626 5342286 2760193 30147717 947616 54158 net/route.c:147 (sleep mutex:radix node head) 326 3562050 3381510 3014771111 133968 110104 net/route.c:197 (sleep mutex:rtentry) 146 947173 5173813 301477 31744578 120961 net/route.c:1290 (sleep mutex:rtentry) 146 953718 5501119 301476 31863285 121819 netinet/ip_output.c:610 (sleep mutex:rtentry) 50 4530645 7885304 1423098 3 5 642391 788230 kern/subr_turnstile.c:489 (spin mutex:turnstile chain) i.e. during a 30 second sample we spend a total of >14 seconds (on all cpus) waiting to acquire the rtentry mutex. This appears to be because (among other things), we increment and then decrement the route refcount for each packet we send, each of which requires acquiring the rtentry mutex for that route before adjusting the refcount. So multiplexing traffic for lots of connections over a single route is being partly rate-limited by those mutex operations. This is not the end of the story though, the bge driver is a serious bottleneck on its own (e.g. I nulled out the route locking since it is not relevant in my environment, at least for the purposes of this test, and that exposed bge as the next problem -- but other drivers may not be so bad). Kris ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Scalability problem from route refcounting
On Wed, Mar 14, 2007 at 06:45:20PM -0700, Kip Macy wrote: > Apologies in advance if you have already answered this question > elsewhere - can you point me to a HOWTO for replicating the test in my > local environment? Well it's not completely spelled out...but most of the steps are documented in http://people.freebsd.org/~kris/scaling/mysql.html and references therein. You'll probably want to use my kris-contention p4 branch to avoid the scaling bottlenecks we have identified and solved so far. Kris ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"