Generic ioctl and ether_ioctl don't agree

2007-03-14 Thread Yar Tikhiy
Hi folks,

Quite a while ago I noticed that our ioctl handlers get the ioctl
command via u_long, but ether_ioctl()'s command argument is int.
This disarray dates back to 1998, when ioctl functions started to
take u_long as the command, but ether_ioctl() was never fixed.
Fortunately, our ioctl command coding still fits in 32 bits, or
else we would've got problems on 64-bit arch'es already.  I'd like
to fix this long-standing bug some day after RELENG_7 is branched.
Of course, this will break ABI to network modules on all 64-bit
arch'es.  BTW, the same applies to other L2 layers, such as firewire,
which seems to have been cloned from if_ethersubr.c.

Any objections or comments?  Thanks!

-- 
Yar
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Who is to load dummynet.ko?

2007-03-14 Thread Yar Tikhiy
On Tue, Mar 13, 2007 at 12:45:43AM -0700, Luigi Rizzo wrote:
> On Sat, Mar 10, 2007 at 06:35:34PM +0300, Yar Tikhiy wrote:
> > Hi folks,
> > 
> > Just noticed that neither ipfw(8) nor /etc/rc.d/ipfw cares to load
> > dummynet.ko.  It can result in a broken setup when one migrates
> > from a custom monolithic kernel to GENERIC with modules, which is
> > a nice way to reduce support headache today.
> > 
> > There are at least two possible ways to deal with the issue.  The
> > easy way is to give the task of loading dummynet.ko to /etc/rc.d/ipfw.
> > The problem with it is that the script cannot know in advance if
> > dummynet is really used by the ipfw rules to be loaded.  The decision
> > whether to load the module is left to rc.conf(5) in this case.
> 
> i think this is a reasonable option and the one to use for all ipfw
> extensions (divert, dummynet, in-kernel nat and so on).

Thank you for your reply!  Loading dummynet via rc.d is really easy,
I think I can add it.  However, I'm afraid it won't be a real
long-term solution -- please see below.

> Making the load on demand would require a bit of additional code because
> it depends on the actual rules being loaded, and the rules are not
> parsed at load time. Plus, i believe that in a case like this
> the decision of which modules to load should be a conscious one
> taken upfront by the system administrator (i.e. end up in rc.conf
> or loader.conf) rather than be the result of the actual ipfw
> configuration.

Well, I used to stick to this opinion, too, in the good old days.
But today we are growing more and more modularity in our kernel,
and it's a nice feature to have.  With a lot of modules, the issue
of double configuration appears: if I want feature FOO, I have to
add its configuration AND not forget to load the respective module.
It can be a pain as the number of such cases rockets up.  Today at
least mount, ifconfig, and netgraph provide for loading modules on
demand, with the former two being system's core components.

I've just taken a look at the ipfw userland utility code.  It notices
a "pipe" or "queue" keyword in its command line rather early, and
it can be a good moment to check and load dummynet.ko.  Ditto for
divert.  Or the inner function do_cmd() can load a missing module
before issuing setsockopt(), which is even better as we won't load
the module on a read access or if the command line contains a syntax
error.  do_cmd() can even load ipfw.ko itself so that people no longer
have to type, e.g.:

(kldload ipfw && ipfw add 65000 allow ip from any to any) > /dev/null 2>&1

insead of just:

ipfw add 65000 allow ip from any to any

if a need to load ipfw.ko for some experiments arise.

> > The second way is to move the task of loading modules to ipfw(8).
> > Then it could load ipfw.ko, divert.ko, and dummynet.ko on demand.
> 
> cheers
> luigi

-- 
Yar
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: kern/106722: [net] [patch] ifconfig may not connect an interface to known network

2007-03-14 Thread Gleb Smirnoff
Synopsis: [net] [patch] ifconfig may not connect an interface to known network

Responsible-Changed-From-To: freebsd-net->glebius
Responsible-Changed-By: glebius
Responsible-Changed-When: Wed Mar 14 11:18:24 UTC 2007
Responsible-Changed-Why: 
I'll work on this.

http://www.freebsd.org/cgi/query-pr.cgi?pr=106722
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Who is to load dummynet.ko?

2007-03-14 Thread Luigi Rizzo
On Wed, Mar 14, 2007 at 12:57:26PM +0300, Yar Tikhiy wrote:
> On Tue, Mar 13, 2007 at 12:45:43AM -0700, Luigi Rizzo wrote:
...
> > Making the load on demand would require a bit of additional code because
> > it depends on the actual rules being loaded, and the rules are not
> > parsed at load time. Plus, i believe that in a case like this
> > the decision of which modules to load should be a conscious one
> > taken upfront by the system administrator (i.e. end up in rc.conf
> > or loader.conf) rather than be the result of the actual ipfw
> > configuration.
> 
> Well, I used to stick to this opinion, too, in the good old days.
> But today we are growing more and more modularity in our kernel,
> and it's a nice feature to have.  With a lot of modules, the issue
> of double configuration appears: if I want feature FOO, I have to

yes this is also try.

> add its configuration AND not forget to load the respective module.
> It can be a pain as the number of such cases rockets up.  Today at
> least mount, ifconfig, and netgraph provide for loading modules on
> demand, with the former two being system's core components.
> 
> I've just taken a look at the ipfw userland utility code.  It notices
> a "pipe" or "queue" keyword in its command line rather early, and
> it can be a good moment to check and load dummynet.ko.  Ditto for

actually, i think it is the kernel itself (in the setsockopt handler,
once it validates the rule) that should load the module, and not leave
the task to the userland utility. Other modules already do this,
e.g. iwi loads the firmware autonomously, and maybe even netgraph components
do something similar.

For dummynet and divert, this can be surely put in the setsockopt
handler which is in ipfw.ko - if you need to autoload ipfw.ko,
then i am not sure where to put the hooks (in the kernel) but i am
pretty confident that there must be a good place.

cheers
luigi
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: kern/106722: [net] [patch] ifconfig may not connect an interface to known network

2007-03-14 Thread Gleb Smirnoff
  Just for the reference, here is a backtrace that shows how EEXIST is
returned:

rtrequest1(1,e6560aec,e6560ae0,e6560aec,30,...) at rtrequest1+0x658^M
rtinit(c3e21500,1,1) at rtinit+0x193^M
in_addprefix(c3e21500,1,e6560b68,0,1,...) at in_addprefix+0xa1^M
in_ifinit(c3c4ec00,c3e21500,c3eb6e50,0) at in_ifinit+0x761^M
in_control(c3f37bac,8040691a,c3eb6e40,c3c4ec00,c3e9b740) at in_control+0x93e^M
ifioctl(c3f37bac,8040691a,c3eb6e40,c3e9b740,0,...) at ifioctl+0x1cf^M
soo_ioctl(c3e5a828,8040691a,c3eb6e40,c414e000,c3e9b740) at soo_ioctl+0x2db^M
kern_ioctl(c3e9b740,3,8040691a,c3eb6e40) at kern_ioctl+0x296^M
ioctl(c3e9b740,e6560d00) at ioctl+0xf1^M
syscall(e6560d38) at syscall+0x242^M
Xint0x80_syscall() at Xint0x80_syscall+0x20^M

  The patch proposed vy Vladimir really looks like a hack. It covers only a
case when old route was a gateway one. So, even with patch the following
won't work:

route add 10.0.0.0/24 -iface lo0
ifconfig IFACE 10.0.0.1/24 alias

Also, I am afraid of the side effects, when patched kernel will substitute
route in a case when it should return error.

AFAIK, the problem needs a more generic approach. I see two approaches.

1) Introduce RTM_CHANGEADD, a command that will forcibly add route,
deleting all conflicting ones. Use this command in in_addprefix().

2) In rt_flags field we still have several extra bits. We can use
them to specify route source - RTS_CONNECTED, RTS_STATIC, RTS_XXX,
where XXX is a routing protocol. When issuing RTM_ADD a route with
a preferred source (e.g. CONNECTED vs STATIC) will override the old
one.

freebsd-net subscibers, what do you think?

-- 
Totus tuus, Glebius.
GLEBIUS-RIPN GLEB-RIPE
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: tap(4) should go UP if opened

2007-03-14 Thread Frank Behrens
Bruce M. Simpson <[EMAIL PROTECTED]> wrote on 9 Mar 2007 12:30:
> However, we also support the creation of tap/tun instances by 
> non-super-users, so there is motivation for the change. Configuring a 
> tap interface to up by a non-superuser should only be permitted if the 
> interface itself was created by a non-superuser, and if 
> net.link.tap.user_open is set to 1.
> 
> A more involved patch is needed to do this right for all cases -- we 
> should not do this by default.

After thinking about the problem I agree with you and propose the following 
patch:
--- sys/net/if_tap.c.orig   Thu Mar  8 19:10:59 2007
+++ sys/net/if_tap.cWed Mar 14 12:35:58 2007
@@ -501,6 +501,8 @@
s = splimp();
ifp->if_drv_flags |= IFF_DRV_RUNNING;
ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
+   if (tapuopen)
+   ifp->if_flags |= IFF_UP;
splx(s);

TAPDEBUG("%s is open. minor = %#x\n", ifp->if_xname, minor(dev));

Rationale:
For transmitting packets via tap(4) device (at least) two conditions have to 
met:
1. The control device must be opened by an process.
2. The ethernet interface must be UP.

For 1. we allow non-root processes the access, when
a) sysctl net.link.tap.user_open=1   AND
b) /dev/tapx has sufficient permissions

If we have no possibility to mark the interface as UP for the non-root process 
the 
net.link.tap.user_open=1 is useless, because we can not transmit any packets. 
With the 
patch the interface goes UP only, when the administrator allowed non-root user 
access.

Regards,
   Frank
-- 
Frank Behrens, Osterwieck, Germany
PGP-key 0x5B7C47ED on public servers available.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: netisr_direct

2007-03-14 Thread Keith Arner

On 3/11/07, Robert Watson <[EMAIL PROTECTED]> wrote:



Yes -- right now the in-bound TCP path is essentially serialized because
of
the tcbinfo lock.  The reason for this is that the tcbinfo lock doesn't
just
protect the inpcb chains during lookup, but also effectively acts as a
reference to prevent the inpcb from being freed during input processing.
There are several ways we could start to reduce contention on that lock:



So, why is the tcbinfo lock being used to protect the pcb from deletion?
Why isn't the INP_LOCK on the pcb used, instead?

Keith

--
Well,  I didn't find the Holy Grail,
 but I did find a rusty cup without too many holes in it...
-- Jeff Semke
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Generic ioctl and ether_ioctl don't agree

2007-03-14 Thread Bruce M. Simpson

Yar Tikhiy wrote:

Hi folks,

Quite a while ago I noticed that our ioctl handlers get the ioctl
command via u_long, but ether_ioctl()'s command argument is int.
This disarray dates back to 1998, when ioctl functions started to
take u_long as the command, but ether_ioctl() was never fixed.
Fortunately, our ioctl command coding still fits in 32 bits, or
else we would've got problems on 64-bit arch'es already.  I'd like
to fix this long-standing bug some day after RELENG_7 is branched.
Of course, this will break ABI to network modules on all 64-bit
arch'es.  BTW, the same applies to other L2 layers, such as firewire,
which seems to have been cloned from if_ethersubr.c.
  
This is one of those annoying things which breaks compatibility with 
external modules.


I'm not sure about this, though. I was getting sign extension warnings 
on amd64 last week when I was testing the IGMPv3 aware mtest(8). Perhaps 
if we're fixing these ABIs, we should commit to an explicit C99 type 
with known bit width, i.e. uint32_t.


I would be much happier if we began using C99 types in the code.

Just my 2c.
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: netisr_direct

2007-03-14 Thread Robert Watson


On Wed, 14 Mar 2007, Keith Arner wrote:


On 3/11/07, Robert Watson <[EMAIL PROTECTED]> wrote:


Yes -- right now the in-bound TCP path is essentially serialized because of 
the tcbinfo lock.  The reason for this is that the tcbinfo lock doesn't 
just protect the inpcb chains during lookup, but also effectively acts as a 
reference to prevent the inpcb from being freed during input processing. 
There are several ways we could start to reduce contention on that lock:


So, why is the tcbinfo lock being used to protect the pcb from deletion? Why 
isn't the INP_LOCK on the pcb used, instead?


The reasoning here is a little complex, and has to do with combining two uses 
of the tcbinfo lock.  The tcbinfo lock is before the inpcb lock in the lock 
order, as you need to access the tcbinfo lists in order to acquire a reference 
to the inpcb.  tcp_input() will always acquire a tcbinfo lock (whether one as 
today, or one of several in the future) in order to look up the inpcb. 
tcp_input() will then also acquire an inpcb lock to protect individual 
connection state.


There are then two cases: simple cases, where we know we don't need to access 
the lists again, and then complex cases where we may need to access the list. 
A typical example of the former is a straight ACK in the fast path, which will 
modify per-connection state only, and a typical example of the latter is a RST 
where we will tear down connection, which may remove the inpcb from the global 
lists.  In the former case, we do release the tcbinfo lock (in most cases) 
once we have decided that we won't need it; in the latter case we hold it 
because re-acquiring the lock would require dropping the inpcb lock for lock 
order reasons should the connection close.  This is where moving to a 
reference count would help us: it would allow releasing both locks while 
maintaining a valid pointer to the inpcb, in turn letting us drop the tcbinfo 
lock and then re-acquire it later if we do hit a connection close case.  This 
could use some refinement, and there are probably more cases we could be 
dropping the tcbinfo lock.


BTW, in 7.x there is significantly less contention on the pcbinfo lock because 
it's no longer acquired in any of the common send and receive paths in TCP, 
whereas previously it was.  This significantly lowers contention between the 
upper/lower halves of the kernel: that is, between a user thread performing 
send or receive on a TCP socket and netisr processing.  In 6.x, the pcbinfo 
lock is used more extensively in order to prevent the inpcb from being freed. 
The change I've made in 7.x is to guarantee that so_pcb will always be valid 
for a properly referenced socket, keeping the inpcb around until the socket is 
freed in the case of a reset, rather than leaving the socket without the inpcb 
(and hence requiring a lock to keep so_pcb valid).


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: tap(4) should go UP if opened

2007-03-14 Thread Bruce M. Simpson

Hi,

Frank Behrens wrote:
If we have no possibility to mark the interface as UP for the non-root process the 
net.link.tap.user_open=1 is useless, because we can not transmit any packets. With the 
patch the interface goes UP only, when the administrator allowed non-root user access.


  


The conditional in the second patch is a no-op as the open will be 
forbidden if the user did not have privilege to open the tap. Bringing 
the interface up by default potentially violates POLA, so this should 
not happen by default.


Please try the attached patch, which puts this behaviour under a sysctl.

Thanks,
BMS
 //depot/user/bms/netdev/sys/net/if_tap.c#1 - /home/bms/p4/netdev/sys/net/if_tap.c 
--- /tmp/tmp.58336.0	Wed Mar 14 13:06:09 2007
+++ /home/bms/p4/netdev/sys/net/if_tap.c	Wed Mar 14 13:05:54 2007
@@ -150,7 +150,8 @@
  */
 static struct mtx		tapmtx;
 static int			tapdebug = 0;/* debug flag   */
-static int			tapuopen = 0;/* allow user open() */	 
+static int			tapuopen = 0;/* allow user open() */
+static int			tapuponopen = 0;/* IFF_UP on open() */
 static int			tapdclone = 1;	/* enable devfs cloning */
 static SLIST_HEAD(, tap_softc)	taphead; /* first device */
 static struct clonedevs 	*tapclones;
@@ -164,6 +165,8 @@
 "Ethernet tunnel software network interface");
 SYSCTL_INT(_net_link_tap, OID_AUTO, user_open, CTLFLAG_RW, &tapuopen, 0,
 	"Allow user to open /dev/tap (based on node permissions)");
+SYSCTL_INT(_net_link_tap, OID_AUTO, up_on_open, CTLFLAG_RW, &tapuponopen, 0,
+	"Bring interface up when /dev/tap is opened");
 SYSCTL_INT(_net_link_tap, OID_AUTO, devfs_cloning, CTLFLAG_RW, &tapdclone, 0,
 	"Enably legacy devfs interface creation");
 SYSCTL_INT(_net_link_tap, OID_AUTO, debug, CTLFLAG_RW, &tapdebug, 0, "");
@@ -502,6 +505,8 @@
 	s = splimp();
 	ifp->if_drv_flags |= IFF_DRV_RUNNING;
 	ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
+	if (tapuponopen)
+		ifp->if_flags |= IFF_UP;
 	splx(s);
 
 	TAPDEBUG("%s is open. minor = %#x\n", ifp->if_xname, minor(dev));
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: [PATCH] Removal of redundant entries from ifnet manpage

2007-03-14 Thread Bruce M. Simpson

Aniruddha Bohra wrote:

Hi,
The ifnet manpage contains entries for the following routines which do 
not exist in the ifnet struct. 

committed, thanks!
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: tap(4) should go UP if opened

2007-03-14 Thread Frank Behrens
Bruce,
many thanks for your fast response.

Bruce M. Simpson <[EMAIL PROTECTED]> wrote on 14 Mar 2007 13:09:
> The conditional in the second patch is a no-op as the open will be 
> forbidden if the user did not have privilege to open the tap. Bringing 

No. A process running with root rights can always open the tap.

> the interface up by default potentially violates POLA, so this should 
> not happen by default.

Ok, I see that the behaviour changes. 

I wonder who used the "tap user open" sysctl, when additional root rights are 
necessary to 
bring the interface UP? I can't imagine a setup where this could be used, 
somebody else?

> Please try the attached patch, which puts this behaviour under a sysctl.

Fine! This should work without problems. I agree with this solution, sounds 
good. I'll test it 
and report the result.

Regards and thanks for your support,
   Frank
-- 
Frank Behrens, Osterwieck, Germany
PGP-key 0x5B7C47ED on public servers available.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Who is to load dummynet.ko?

2007-03-14 Thread Yar Tikhiy
On Wed, Mar 14, 2007 at 04:35:06AM -0700, Luigi Rizzo wrote:
> On Wed, Mar 14, 2007 at 12:57:26PM +0300, Yar Tikhiy wrote:
> > On Tue, Mar 13, 2007 at 12:45:43AM -0700, Luigi Rizzo wrote:
> ...
> > > Making the load on demand would require a bit of additional code because
> > > it depends on the actual rules being loaded, and the rules are not
> > > parsed at load time. Plus, i believe that in a case like this
> > > the decision of which modules to load should be a conscious one
> > > taken upfront by the system administrator (i.e. end up in rc.conf
> > > or loader.conf) rather than be the result of the actual ipfw
> > > configuration.
> > 
> > Well, I used to stick to this opinion, too, in the good old days.
> > But today we are growing more and more modularity in our kernel,
> > and it's a nice feature to have.  With a lot of modules, the issue
> > of double configuration appears: if I want feature FOO, I have to
> 
> yes this is also try.
> 
> > add its configuration AND not forget to load the respective module.
> > It can be a pain as the number of such cases rockets up.  Today at
> > least mount, ifconfig, and netgraph provide for loading modules on
> > demand, with the former two being system's core components.
> > 
> > I've just taken a look at the ipfw userland utility code.  It notices
> > a "pipe" or "queue" keyword in its command line rather early, and
> > it can be a good moment to check and load dummynet.ko.  Ditto for
> 
> actually, i think it is the kernel itself (in the setsockopt handler,
> once it validates the rule) that should load the module, and not leave
> the task to the userland utility. Other modules already do this,
> e.g. iwi loads the firmware autonomously, and maybe even netgraph components
> do something similar.
> 
> For dummynet and divert, this can be surely put in the setsockopt
> handler which is in ipfw.ko - if you need to autoload ipfw.ko,
> then i am not sure where to put the hooks (in the kernel) but i am
> pretty confident that there must be a good place.

As our fortune file puts it, "If God is dead, who will save the
Queen?" :-)  We seem to have a sort of a chicken and egg problem
here.  Perhaps that's why most auto-loading is done from the userland.
Of course, putting it in the kernel is better because it allows for
different control tools unconcerned with the modules (imagining
xipfw(8) here), but it has this shortcoming.  Another funny point
is that now it's impossible to unload a module auto-loaded by the
kernel itself.  I'm uncertain if it's an architectural limitation
or "just in case".

-- 
Yar
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Who is to load dummynet.ko?

2007-03-14 Thread Luigi Rizzo
On Wed, Mar 14, 2007 at 05:11:43PM +0300, Yar Tikhiy wrote:
> On Wed, Mar 14, 2007 at 04:35:06AM -0700, Luigi Rizzo wrote:
...
> > actually, i think it is the kernel itself (in the setsockopt handler,
> > once it validates the rule) that should load the module, and not leave
> > the task to the userland utility. Other modules already do this,
> > e.g. iwi loads the firmware autonomously, and maybe even netgraph components
> > do something similar.
> > 
> > For dummynet and divert, this can be surely put in the setsockopt
> > handler which is in ipfw.ko - if you need to autoload ipfw.ko,
> > then i am not sure where to put the hooks (in the kernel) but i am
> > pretty confident that there must be a good place.
> 
> As our fortune file puts it, "If God is dead, who will save the
> Queen?" :-)  We seem to have a sort of a chicken and egg problem

not really.
IP_FW_GET (and other commands) are processed in

sys/netinet/raw_ip.c::rip_ctloutput()

so it's there that we can try and autoload the module (if ip_fw_ctl_ptr == 
NULL).

I don't know if there are hooks to autoload a protocol stack,
as some are not modules - no ipv4.ko, ipv6.ko, but there is arcnet.ko,
but you could in principle do the same thing with protocols and anywhere
there is a missing function, annotate it with the functions to
autoload the module supplying it.

cheers
luigi
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Generic ioctl and ether_ioctl don't agree

2007-03-14 Thread Brooks Davis
On Wed, Mar 14, 2007 at 01:20:23PM +0300, Yar Tikhiy wrote:
> Hi folks,
> 
> Quite a while ago I noticed that our ioctl handlers get the ioctl
> command via u_long, but ether_ioctl()'s command argument is int.
> This disarray dates back to 1998, when ioctl functions started to
> take u_long as the command, but ether_ioctl() was never fixed.
> Fortunately, our ioctl command coding still fits in 32 bits, or
> else we would've got problems on 64-bit arch'es already.  I'd like
> to fix this long-standing bug some day after RELENG_7 is branched.
> Of course, this will break ABI to network modules on all 64-bit
> arch'es.  BTW, the same applies to other L2 layers, such as firewire,
> which seems to have been cloned from if_ethersubr.c.
> 
> Any objections or comments?  Thanks!

Why wait?  We're allowed to break module ABIs in current at any time and
there's no chance modules built on RELENG_6 will work on RELENG_7
trees anyway.

-- Brooks


pgpBjF7OcidMq.pgp
Description: PGP signature


Re: IPv6 over gif(4) broken in 6.2-RELEASE?

2007-03-14 Thread John Hay
Hi Tatuya,

Well after getting distracted for a while, I am back with this one.

On Fri, Jan 26, 2007 at 03:13:07AM +0900, JINMEI Tatuya / [EMAIL 
PROTECTED]@C#:H wrote:
> > On Sun, 21 Jan 2007 09:32:44 +0200, 
> > John Hay <[EMAIL PROTECTED]> said:
> 
> >> There's another workaround for people stuck in this situation and who
> >> aren't in a position to try this diff.  That is to manually install
> >> the host route like this:
> >> 
> >> # route add -host -inet6 :::::2 -interface gif0 -nostatic 
> >> -llinfo
> >> 
> >> Comments?
> 
> > Well it seems that even my stuff does not always work perfectly with that
> > change (1.48.2.15), so maybe we should revert it and I will search for
> > yet other ways to make FreeBSD's IPv6 code to actually work for our stuff.
> 
> > My "stuff" is a wireless IPv6 only network running in adhoc mode with
> > olsrd as the routing protocol. The problem is that all nodes on a subnet
> > cannot "see" each other, so olsrd needs to add routes to a node through
> > another node. Sometimes, just to complicate matters a little more, you
> > would want to have more than one network card in a host, all with the same
> > subnet address. (For instance on a high site, with sector antennas.)
> 
> > The case that I found that still does not work reliably, is if olsrd add
> > the route and route is not immediately used, then the nd code will time
> > it out and remove it.
> 
> I think I'm responsible for the troubles.  I've been thinking about
> how to meet all the requests, and concluded that it's more complicated
> than I originally thought.
> 
> I've come across an idea that may solve the problems, but I'll need
> more time to implement and test it.
> 
> At the moment, I suggest reverting the 1.48.2.16 change for those who
> simply wanted a gif to work.
> 
> Regarding the OLSRD stuff, I'd like to know more specific features
> that are sought.  For example,
> 
> - what should happen if link-layer address resolution fails?  Should
>   then entry be removed?  Probably not according to your description
>   above, but what would you expect the entry to become in this case?
> 
> - once the link-layer address is resolved for the entry, should it be
>   regarded as "permanent" without any ND state changes?  For example,
>   should NUD be performed on the cache?  If yes, what should happen if
>   NUD detects the neighbor is unreachable?  Should the entry be
>   removed?  Again, probably not, but then what should it become?
>   Keeping it with the same link-layer address?  Keeping it with an
>   empty link-layer address?  Others?  What if the neighbor is acting
>   as a router (setting the router flag in NAs)?  Should destination
>   caches using the now-unreachable router be removed as described in
>   the protocol spec?  Or should the destination caches be intact?
> 
> I have my own speculation on these points, but I'd like to know what
> the actual user(s) of these features want before taking any action
> based on the speculation.

Maybe some background.

Olsrd (http://www.olsr.org/) does not use link-local addresses. I
think it might have made thinks simpler...

Except if you kill it in a weird way, it should remove routes that it
have added, so I guess we don't really need a timeout.

I can think of 3 types of routes that olsr use:

1) A direct route. In a single interface, you actually would not need
   these because the standard FreeBSD/Kame IPv6 code would handle it.
   The problem come in when you have more than one interface (and in
   the same subnet). I think it would be great if I can just tell the
   kernel which interface to use and let it do all the normal IPv6
   stuff to make comms work to that host, but do it on the specified
   interface. If it then timeout the low-level stuff because of comms
   problems, that is fine, as long as it remember which interface to
   use when it has to try again.

   Let me try a picture:

   An olsr router may have more than one wireless interface to cover
   different areas. In this example ath0 and ath1 are configured with
   the same IPv6 subnet, eg. fd12:3:4:5::/48

   -
   |   |
   |D  |
   |   |
   |ath0   ath1|
   -
 )-|   |-(
  ) (
   )   (


  ABC

   Then the client might be somewhere between A and B and sometimes
   work through ath0 and sometimes through ath1. Olsrd must be able
   to tell the kernel which interface to use. And it must keep on
   using that interface until olsrd delete that route and add another
   one.

2) A host route through another host. In my above picture, C might
   be too far to reach D directly, so it will need to add a route
   through B to

Re: kern/106722: [net] [patch] ifconfig may not connect an interface to known network

2007-03-14 Thread Bruce M. Simpson

Gleb Smirnoff wrote:

AFAIK, the problem needs a more generic approach. I see two approaches.

1) Introduce RTM_CHANGEADD, a command that will forcibly add route,
deleting all conflicting ones. Use this command in in_addprefix().

2) In rt_flags field we still have several extra bits. We can use
them to specify route source - RTS_CONNECTED, RTS_STATIC, RTS_XXX,
where XXX is a routing protocol. When issuing RTM_ADD a route with
a preferred source (e.g. CONNECTED vs STATIC) will override the old
one.

  


The proposed changes also constitute a hack.

I understand that they are being proposed to address problems we 
currently have in the stack, i.e. that we do not support multipathing, 
though it is more than likely they will be blown away in future when the 
architecture changes (and it has to change).


Approach 1 is largely irrelevant if multiple paths are introduced to the 
network stack; there is then no concept of a conflicting forwarding 
entry, only preference derived from the interface, entry flags, or the 
entry ('route') itself.


Approach 2 has some merit to it, although the forwarding plane should 
not care where the forwarding entry came from unless it needs to (e.g. 
next-hop resolution).


It seems reasonable that the forwarding plane should tag entries as 
being 'CONNECTED' i.e. derived from the address configuration of an 
interface. I believe many implementations out there do this, and 
multi-path does not change this.


We already have the RTF_PROTO1 flag to determine if the forwarding entry 
('route') came from a routing protocol in userland, so there should be 
no need to change the existing flags.


The RTF_STATIC flag only has special meaning in that it means 'the user 
added this forwarding entry manually via the route(8) command'. We 
should preserve these semantics, though I believe we should start 
implementing forwarding preference in the radix trie.


I think it seems acceptable and reasonable that we use a limited form of 
Approach 2 to clobber 'routes' being aded in the case described in the 
PR, until such time as the network stack is re-engineered to support 
multiple paths and forwarding preference.


I also believe it is useful if we start to use more modern technical 
jargon to discuss 'routes' in the network stack, because we are actually 
discussing the behaviour of entries in a forwarding table.


Regards,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: kern/106722: [net] [patch] ifconfig may not connect an interface to known network

2007-03-14 Thread Gleb Smirnoff
On Wed, Mar 14, 2007 at 04:00:13PM +, Bruce M. Simpson wrote:
B> The proposed changes also constitute a hack.
B> 
B> I understand that they are being proposed to address problems we 
B> currently have in the stack, i.e. that we do not support multipathing, 
B> though it is more than likely they will be blown away in future when the 
B> architecture changes (and it has to change).
B> 
B> Approach 1 is largely irrelevant if multiple paths are introduced to the 
B> network stack; there is then no concept of a conflicting forwarding 
B> entry, only preference derived from the interface, entry flags, or the 
B> entry ('route') itself.
B> 
B> Approach 2 has some merit to it, although the forwarding plane should 
B> not care where the forwarding entry came from unless it needs to (e.g. 
B> next-hop resolution).
B> 
B> It seems reasonable that the forwarding plane should tag entries as 
B> being 'CONNECTED' i.e. derived from the address configuration of an 
B> interface. I believe many implementations out there do this, and 
B> multi-path does not change this.
B> 
B> We already have the RTF_PROTO1 flag to determine if the forwarding entry 
B> ('route') came from a routing protocol in userland, so there should be 
B> no need to change the existing flags.
B> 
B> The RTF_STATIC flag only has special meaning in that it means 'the user 
B> added this forwarding entry manually via the route(8) command'. We 
B> should preserve these semantics, though I believe we should start 
B> implementing forwarding preference in the radix trie.
B> 
B> I think it seems acceptable and reasonable that we use a limited form of 
B> Approach 2 to clobber 'routes' being aded in the case described in the 
B> PR, until such time as the network stack is re-engineered to support 
B> multiple paths and forwarding preference.
B> 
B> I also believe it is useful if we start to use more modern technical 
B> jargon to discuss 'routes' in the network stack, because we are actually 
B> discussing the behaviour of entries in a forwarding table.

I was afraid that this would raise an argument on multipath routing. Let's
temporary do not speak about multipath but just decide what is the correct
way to remove conflicting routes when we are assigning an IP prefix to a
local interface?

-- 
Totus tuus, Glebius.
GLEBIUS-RIPN GLEB-RIPE
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: kern/106722: [net] [patch] ifconfig may not connect an interface to known network

2007-03-14 Thread Bruce M. Simpson

Gleb Smirnoff wrote:

I was afraid that this would raise an argument on multipath routing. Let's
temporary do not speak about multipath but just decide what is the correct
way to remove conflicting routes when we are assigning an IP prefix to a
local interface?
  
My suggestion is to take the second approach you outlined but modify it 
slightly.


That way, the conflict between the 'connected' FTE introduced by 
ifconfig'ing the interface and the pre-existing FTE for that network 
prefix, may be resolved in a manner which doesn't break current 
consumers of the routing code, and leaves the way open to do multipath 
later w/o problems.


Regards,
BMS

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re[2]: kern/106722: [net] [patch] ifconfig may not connect an interface to known network

2007-03-14 Thread Anton Yuzhaninov
Wednesday, March 14, 2007, 7:00:13 PM, Bruce M. Simpson wrote:

BMS> Gleb Smirnoff wrote:
>> AFAIK, the problem needs a more generic approach. I see two approaches.
>>
>> 1) Introduce RTM_CHANGEADD, a command that will forcibly add route,
>> deleting all conflicting ones. Use this command in in_addprefix().
>>
>> 2) In rt_flags field we still have several extra bits. We can use
>> them to specify route source - RTS_CONNECTED, RTS_STATIC, RTS_XXX,
>> where XXX is a routing protocol. When issuing RTM_ADD a route with
>> a preferred source (e.g. CONNECTED vs STATIC) will override the old
>> one.
>>

BMS> I understand that they are being proposed to address problems we 
BMS> currently have in the stack, i.e. that we do not support multipathing,
BMS> though it is more than likely they will be blown away in future when the
BMS> architecture changes (and it has to change).

IMHO question is not related to multipathing.
Kernel routes now don't contain administrative distance and it root of
this problem.

RTS_CONNECTED, RTS_STATIC is a hack adding some fixed AD values
without increasing route size in memory.

-- 
 WBR,
 Anton Yuzhaninov

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


fbsd amd64 and fast_ipsec

2007-03-14 Thread rms_zaphod

OK, I have used these ken mods for my file server/nat/router/firewall servers
for years. (kern ops then question)

#Mt e
options IPFIREWALL
options IPDIVERT
options IPFIREWALL_VERBOSE
options IPSTEALTH
options FAST_IPSEC
#smbfs stuff
options NETSMB
options NETSMBCRYPTO
options LIBMCHAIN
options LIBICONV
options SMBFS
#end smbfs stuff
device crypto

With 6.2, with latest (3.13.07) cvsup -L 2 -h `(fastest_cvsup -q -c us )`
/root/stable-supfile

make buildworld etc...I STILL cannot get setkey nor racoon to function. I
keep getting a pfkey error, and cannot establish a VPN tunnel. I can if I
use:

options IPSEC
options IPSEC_ESP
options IPSEC_DEBUG

But giant mutex is enable...etc. However all was good up to 6.1. Is this
broken in 6.2??
-- 
View this message in context: 
http://www.nabble.com/fbsd-amd64-and-fast_ipsec-tf3405028.html#a9484186
Sent from the freebsd-net mailing list archive at Nabble.com.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Scalability problem from route refcounting

2007-03-14 Thread Kris Kennaway
I have recently started looking at database performance over gigabit
ethernet, and there seems to be a bottleneck coming from the way route
reference counting is implemented.  On an 8-core system it looks like
we spend a lot of time waiting for the rtentry mutex:

   maxtotal   wait_total   count   avg wait_avg cnt_hold 
cnt_lock name
[...]
   408   950496  1135994  301418 3 324876
55936 net/if_ethersubr.c:397 (sleep mutex:bge1)
   974   968617  1515169  253772 3 514741
60581 dev/bge/if_bge.c:2949 (sleep mutex:bge1)
  2415 18255976  1607511  25384171 6   125174 
3131 netinet/tcp_input.c:770 (sleep mutex:inp)
   233  1850252  2080506  14181713140   
126897 netinet/tcp_usrreq.c:756 (sleep mutex:inp)
   384  6895050  2737492  29900223 992100
73942 dev/bge/if_bge.c:3506 (sleep mutex:bge1)
   626  5342286  2760193  30147717 947616
54158 net/route.c:147 (sleep mutex:radix node head)
   326  3562050  3381510  3014771111   133968   
110104 net/route.c:197 (sleep mutex:rtentry)
   146   947173  5173813  301477 31744578   
120961 net/route.c:1290 (sleep mutex:rtentry)
   146   953718  5501119  301476 31863285   
121819 netinet/ip_output.c:610 (sleep mutex:rtentry)
50  4530645  7885304 1423098 3 5   642391   
788230 kern/subr_turnstile.c:489 (spin mutex:turnstile chain)

i.e. during a 30 second sample we spend a total of >14 seconds (on all
cpus) waiting to acquire the rtentry mutex.

This appears to be because (among other things), we increment and then
decrement the route refcount for each packet we send, each of which
requires acquiring the rtentry mutex for that route before adjusting
the refcount.  So multiplexing traffic for lots of connections over a
single route is being partly rate-limited by those mutex operations.

This is not the end of the story though, the bge driver is a serious
bottleneck on its own (e.g. I nulled out the route locking since it is
not relevant in my environment, at least for the purposes of this
test, and that exposed bge as the next problem -- but other drivers
may not be so bad).

Kris



pgphfb5isF86R.pgp
Description: PGP signature


Re: Scalability problem from route refcounting

2007-03-14 Thread Kip Macy

Apologies in advance if you have already answered this question
elsewhere - can you point me to a HOWTO for replicating the test in my
local environment?

-Kip

On 3/14/07, Kris Kennaway <[EMAIL PROTECTED]> wrote:

I have recently started looking at database performance over gigabit
ethernet, and there seems to be a bottleneck coming from the way route
reference counting is implemented.  On an 8-core system it looks like
we spend a lot of time waiting for the rtentry mutex:

   maxtotal   wait_total   count   avg wait_avg cnt_hold
cnt_lock name
[...]
   408   950496  1135994  301418 3 324876
55936 net/if_ethersubr.c:397 (sleep mutex:bge1)
   974   968617  1515169  253772 3 514741
60581 dev/bge/if_bge.c:2949 (sleep mutex:bge1)
  2415 18255976  1607511  25384171 6   125174
 3131 netinet/tcp_input.c:770 (sleep mutex:inp)
   233  1850252  2080506  14181713140
126897 netinet/tcp_usrreq.c:756 (sleep mutex:inp)
   384  6895050  2737492  29900223 992100
73942 dev/bge/if_bge.c:3506 (sleep mutex:bge1)
   626  5342286  2760193  30147717 947616
54158 net/route.c:147 (sleep mutex:radix node head)
   326  3562050  3381510  3014771111   133968
110104 net/route.c:197 (sleep mutex:rtentry)
   146   947173  5173813  301477 31744578
120961 net/route.c:1290 (sleep mutex:rtentry)
   146   953718  5501119  301476 31863285
121819 netinet/ip_output.c:610 (sleep mutex:rtentry)
50  4530645  7885304 1423098 3 5   642391
788230 kern/subr_turnstile.c:489 (spin mutex:turnstile chain)

i.e. during a 30 second sample we spend a total of >14 seconds (on all
cpus) waiting to acquire the rtentry mutex.

This appears to be because (among other things), we increment and then
decrement the route refcount for each packet we send, each of which
requires acquiring the rtentry mutex for that route before adjusting
the refcount.  So multiplexing traffic for lots of connections over a
single route is being partly rate-limited by those mutex operations.

This is not the end of the story though, the bge driver is a serious
bottleneck on its own (e.g. I nulled out the route locking since it is
not relevant in my environment, at least for the purposes of this
test, and that exposed bge as the next problem -- but other drivers
may not be so bad).

Kris



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Scalability problem from route refcounting

2007-03-14 Thread Kris Kennaway
On Wed, Mar 14, 2007 at 06:45:20PM -0700, Kip Macy wrote:
> Apologies in advance if you have already answered this question
> elsewhere - can you point me to a HOWTO for replicating the test in my
> local environment?

Well it's not completely spelled out...but most of the steps are
documented in http://people.freebsd.org/~kris/scaling/mysql.html and
references therein.

You'll probably want to use my kris-contention p4 branch to avoid the
scaling bottlenecks we have identified and solved so far.

Kris
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"