Re: Bug in net/route.c function rtredirect()

2002-06-04 Thread Andre Oppermann

Ruslan Ermilov wrote:
> 
> On Tue, Jun 04, 2002 at 12:05:51AM +0200, Andre Oppermann wrote:
> > After reading this whole redirect stuff a couple of time I've come to
> > the conclusion that the function is right as it is there. There is no
> > such bug as I described it. The rtalloc1() in rtredirect() is incre-
> > menting the refcount on the route found, the two rtfree() after
> > "create:" and "done:" will decrement it as it is must happen (we don't
> > have a reference to that route because we don't clone here).
> >
> Right.  There's just no point in calling rtfree() just to decrement
> the route's rt_refcnt (we were only looking up the route, we didn't
> need a reference to it).  rtfree() does not free the route because
> it's still RTF_UP, and calling rtfree() also needlessly calls
> rnh_close() (in_clsroute() in the PF_INET case).  Hence this would
> still be a slight optimization (tested this time):

This breaks if someone hacks up the routing table. The vast majority
is using rtfree() (actually the RTFREE macro which checks ==0 first).
To be entirely correct and clean the macro RTFREE would be the right
thing to do here.

> %%%
> Index: net/route.c
> ===
> RCS file: /home/ncvs/src/sys/net/route.c,v
> retrieving revision 1.69
> diff -u -p -r1.69 route.c
> --- net/route.c 19 Mar 2002 21:54:18 -  1.69
> +++ net/route.c 4 Jun 2002 06:41:42 -
> @@ -345,7 +345,7 @@ rtredirect(dst, gateway, netmask, flags,
>  */
> create:
> if (rt)
> -   rtfree(rt);
> +   rt->rt_refcnt--;
> flags |=  RTF_GATEWAY | RTF_DYNAMIC;
> bzero((caddr_t)&info, sizeof(info));
> info.rti_info[RTAX_DST] = dst;
> %%%
> 
> > A bug is that host routes created by redirect are never being purged.
> > But that one has been present for a long (?) time.
> >
> > I'm still try to track a problem with the following situation: 1. I
> > get a tcp syn from somewhere, 2. a host route is created by tcp, 3.
> > the synack is sent back, 4. we get a redirect from the router to use
> > a different router, 5. host route created from tcp is updated and
> > replaced by icmp redirect route, 6. I see a RTM_LOSING message and
> > the redirect route is being purged.
> >
> This is handled by in_losing().  in_losing() has all the necessary
> comments explaining what's going on here.

Yes, I see what and why in_losing() is doing it. What I'm wondering is
why are the packets lost so in_losing() is being triggered. In my net-
work such packet loss is not supposed to happen. Otherwise I could only
imagine getting a bugus redirect. That would be a problem on the router
(which is also FreeBSD based). Anyway, I'll investigate my problem here
further and see where that takes me...

> > This happens in, I think, some 5% of the cases. What I'm tracking
> > is why the rtfree() after "create:" is de-refcounting the default
> > route when we should have updated the host route created by tcp
> > before. Maybe this is a side-effect of the tcp syncache and the
> > flow has changed? I'll track what happens there.
> >
> There may be only one reason: there's no (yet) route created by tcp.
> I can't reproduce it here.

In tcp_input.c is a route lookup (via tcp_rtlookup) in the function
tcp_mssopt(). tcp_mssopt is supposed to look up the outgoing inter-
face to find out the MSS this host supports. tcp_rtlookup rtalloc's
a route and clones it if neccessary. This is before any redirect can
happen because it's before we've sent out the packet which triggers
the redirect. syncache_respond is doing the same. So there should be
a route existing prior to the redirect?!

(UDP is not doing this, it does not request a host route but simply
takes whichever routes matches, usually default route).

> > > Heh, so you in fact tried to rtfree() "rt" in "done:" by adding
> > > "rtn".  And how *rtp (if rtp != NULL) will become "rtn" then?
> > > What about this?
> >
> > No. No bug as I said, no need to patch. Sorry for this touble.
> >
> > For the expiration of redirects I'll port/adapt the NetBSD solution
> > and post the patch here.
> >
> We could treat RTF_DYNAMIC routes just like RTF_WASCLONED ones.
> Seems to work just fine here:

Ah, an even smarter solution! I'll try this on my box this evening!

> %%%
> Index: netinet/in_rmx.c
> ===
> RCS file: /home/ncvs/src/sys/netinet/in_rmx.c,v
> retrieving revision 1.42
> diff -u -p -r1.42 in_rmx.c
> --- netinet/in_rmx.c19 Mar 2002 21:25:46 -  1.42
> +++ netinet/in_rmx.c4 Jun 2002 06:41:42 -
> @@ -202,8 +202,10 @@ in_clsroute(struct radix_node *rn, struc
> if((rt->rt_flags & (RTF_LLINFO | RTF_HOST)) != RTF_HOST)
> return;
> 
> -   if((rt->rt_flags & (RTF_WASCLONED | RTPRF_OURS))
> -

Re: Bug in net/route.c function rtredirect()

2002-06-04 Thread Ruslan Ermilov

On Tue, Jun 04, 2002 at 10:24:49AM +0200, Andre Oppermann wrote:
> Ruslan Ermilov wrote:
> > 
> > On Tue, Jun 04, 2002 at 12:05:51AM +0200, Andre Oppermann wrote:
> > > After reading this whole redirect stuff a couple of time I've come to
> > > the conclusion that the function is right as it is there. There is no
> > > such bug as I described it. The rtalloc1() in rtredirect() is incre-
> > > menting the refcount on the route found, the two rtfree() after
> > > "create:" and "done:" will decrement it as it is must happen (we don't
> > > have a reference to that route because we don't clone here).
> > >
> > Right.  There's just no point in calling rtfree() just to decrement
> > the route's rt_refcnt (we were only looking up the route, we didn't
> > need a reference to it).  rtfree() does not free the route because
> > it's still RTF_UP, and calling rtfree() also needlessly calls
> > rnh_close() (in_clsroute() in the PF_INET case).  Hence this would
> > still be a slight optimization (tested this time):
> 
> This breaks if someone hacks up the routing table.
> 
What do you mean by "hacks"?
> The vast majority
> is using rtfree() (actually the RTFREE macro which checks ==0 first).
> To be entirely correct and clean the macro RTFREE would be the right
> thing to do here.
> 
Well, rtinit() does the same decrementing.

> > %%%
> > Index: net/route.c
> > ===
> > RCS file: /home/ncvs/src/sys/net/route.c,v
> > retrieving revision 1.69
> > diff -u -p -r1.69 route.c
> > --- net/route.c 19 Mar 2002 21:54:18 -  1.69
> > +++ net/route.c 4 Jun 2002 06:41:42 -
> > @@ -345,7 +345,7 @@ rtredirect(dst, gateway, netmask, flags,
> >  */
> > create:
> > if (rt)
> > -   rtfree(rt);
> > +   rt->rt_refcnt--;
> > flags |=  RTF_GATEWAY | RTF_DYNAMIC;
> > bzero((caddr_t)&info, sizeof(info));
> > info.rti_info[RTAX_DST] = dst;
> > %%%
> > 
> > > A bug is that host routes created by redirect are never being purged.
> > > But that one has been present for a long (?) time.
> > >
> > > I'm still try to track a problem with the following situation: 1. I
> > > get a tcp syn from somewhere, 2. a host route is created by tcp, 3.
> > > the synack is sent back, 4. we get a redirect from the router to use
> > > a different router, 5. host route created from tcp is updated and
> > > replaced by icmp redirect route, 6. I see a RTM_LOSING message and
> > > the redirect route is being purged.
> > >
> > This is handled by in_losing().  in_losing() has all the necessary
> > comments explaining what's going on here.
> 
> Yes, I see what and why in_losing() is doing it. What I'm wondering is
> why are the packets lost so in_losing() is being triggered. In my net-
> work such packet loss is not supposed to happen. Otherwise I could only
> imagine getting a bugus redirect. That would be a problem on the router
> (which is also FreeBSD based). Anyway, I'll investigate my problem here
> further and see where that takes me...
> 
OK, you're the best person to try realize that.  :-)

> > > This happens in, I think, some 5% of the cases. What I'm tracking
> > > is why the rtfree() after "create:" is de-refcounting the default
> > > route when we should have updated the host route created by tcp
> > > before. Maybe this is a side-effect of the tcp syncache and the
> > > flow has changed? I'll track what happens there.
> > >
> > There may be only one reason: there's no (yet) route created by tcp.
> > I can't reproduce it here.
> 
> In tcp_input.c is a route lookup (via tcp_rtlookup) in the function
> tcp_mssopt(). tcp_mssopt is supposed to look up the outgoing inter-
> face to find out the MSS this host supports. tcp_rtlookup rtalloc's
> a route and clones it if neccessary. This is before any redirect can
> happen because it's before we've sent out the packet which triggers
> the redirect. syncache_respond is doing the same. So there should be
> a route existing prior to the redirect?!
> 
This TCP-cloned route is only installed after a connection is
established.  There's the packet flow taking place before it
is established.  Could it be that you're seeing these RTM_LOOSING
during the connection establishment phase?  What does tcpdump(1)
show you?  What does ``route -vn monitor'' tell you?

> (UDP is not doing this, it does not request a host route but simply
> takes whichever routes matches, usually default route).
> 
I know, there's no per-destination metrics for UDP.  :-)

> > > > Heh, so you in fact tried to rtfree() "rt" in "done:" by adding
> > > > "rtn".  And how *rtp (if rtp != NULL) will become "rtn" then?
> > > > What about this?
> > >
> > > No. No bug as I said, no need to patch. Sorry for this touble.
> > >
> > > For the expiration of redirects I'll port/adapt the NetBSD solution
> > > a

ospf

2002-06-04 Thread Oles' Hnatkevych

Hello ppl

   I need rock-solid OSPF daemon that works on FreeBSD-4.5.
   Zebra has not proved to be that one since tun(4) interfaces
   are created on the fly - it crashes the box.
   Please give me some advice what port should I use.

With best wishes, Oles' Hnatkevych, http://gnut.kiev.ua, [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ospf

2002-06-04 Thread Dean Strik

Oles' Hnatkevych wrote:
>I need rock-solid OSPF daemon that works on FreeBSD-4.5.
>Zebra has not proved to be that one since tun(4) interfaces
>are created on the fly - it crashes the box.

Which is not zebra's fault... better fix the system than work around it
by using an other program...

-- 
Dean C. Strik Eindhoven University of Technology
[EMAIL PROTECTED]  |  [EMAIL PROTECTED]  |  http://www.ipnet6.org/
"This isn't right. This isn't even wrong." -- Wolfgang Pauli

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Race condition with M_EXT ref count?

2002-06-04 Thread Andrew Gallatin


Archie Cobbs writes:
 > Re: the -stable patch. I agree we need a more general MFC/cleanup
 > of some of the mbuf improvements from -current into -stable.
 > If I find time perhaps I'll do that as well, but in a separate patch.
 > For the present time, I'll commit this once 4.6-REL is done.

The best improvements (IMHO, anyway) are the changes in the way
external mbufs are referenced and dereferenced.  It greatly simplifies
writing code that uses external mbufs.

However, you need to carefully consider what you backport.  We want to
try to avoid breaking ABI compatibility for 3rd party vendors who ship
binary network interface drivers.  At Myricom, we offer a binary driver
built on 4.1.1-RELEASE and it works through 4.5-RELEASE(*).  It would be
bad to loose this feature.

So I'd oppose MFC'ing anything that breaks binary compatibility for
network drivers in the 4.x line.  The last thing we want to do is make
extra work for the few companies who offer FreeBSD drivers for their
hardare. 

Cheers,

Drew

(*) We also ship source, but building from source is somewhat of a pain.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Dummynet WFQ

2002-06-04 Thread Lars Eggert

Luigi Rizzo wrote:
> the signal that tell the WFQ algorithm when you can transmit the
> next packet comes from the pipe. The latter ticks either at a
> predefined rate (as configured with the 'bw NNN bit/s' parameter),
> or from the tx interrupt coming from a device (e.g. you can say
> something like 'bw ed0' to get the transmit clock from device ed0).
> 
> HOWEVER: i have implemented the necessary machinery in dummynet (it
> is a function called if_tx_rdy()) and in the user interface, "ipfw",
> but have not added the hooks to call if_tx_rdy() in each device
> driver because these calls are somewhat expensive, and you probably do not want
> them on a 100Mbit/s interface.
> 
> See http://www.geocrawler.com/archives/3/165/2002/3/0/8222181/
> on how to use them (a search for "dummynet if_tx_rdy()" should return
> some results).

I'm trying to merge this into the sis driver, which seems to batch 
transmissions together. For clarification, do you expect one if_tx_rdy() 
  call per packet or one per batch? Per packet may result in a burst of 
these calls, does dummynet handle this?

Thanks,
Lars
-- 
Lars Eggert <[EMAIL PROTECTED]>   USC Information Sciences Institute



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Dummynet WFQ

2002-06-04 Thread Lars Eggert

Lars Eggert wrote:
> I'm trying to merge this into the sis driver, which seems to batch 
> transmissions together. For clarification, do you expect one if_tx_rdy() 
>  call per packet or one per batch? Per packet may result in a burst of 
> these calls, does dummynet handle this?

Oh, I'm also using your "polling" version of the sis driver - maybe 
that'd reduce the overhead of if_tx_rdy() you mentioned?

Lars
-- 
Lars Eggert <[EMAIL PROTECTED]>   USC Information Sciences Institute



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Dummynet WFQ

2002-06-04 Thread Luigi Rizzo

Most device drivers batch transmissions, but if you use the interface
as a clock for the pipe, dummynet will only send a single packet
at a time to the device, so you won't have to bother about the
batching.

The overhead is in the fact that if_tx_rdy() has to scan all pipes
to find the one who needs the signal. Polling won't help on this,
whereas it would to have a direct pointer from the interface to the pipe
(but we'd need to extend the struct ifnet, and do the appropriate
garbage collection when interfaces and/or pipes are added/deleted).

BTW if you use polling, you have to be careful in the place where you
put the call to if_tx_rdy() to make sure that it catches the tx queue
becoming empty only once and not at every polling cycle.

cheers
luigi

p.s. the soekris boxes are becoming popular, aren't they!


On Tue, Jun 04, 2002 at 08:50:39AM -0700, Lars Eggert wrote:
> Lars Eggert wrote:
> > I'm trying to merge this into the sis driver, which seems to batch 
> > transmissions together. For clarification, do you expect one if_tx_rdy() 
> >  call per packet or one per batch? Per packet may result in a burst of 
> > these calls, does dummynet handle this?
> 
> Oh, I'm also using your "polling" version of the sis driver - maybe 
> that'd reduce the overhead of if_tx_rdy() you mentioned?
> 
> Lars
> -- 
> Lars Eggert <[EMAIL PROTECTED]>   USC Information Sciences Institute



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Dummynet WFQ

2002-06-04 Thread Lars Eggert

Luigi Rizzo wrote:
> BTW if you use polling, you have to be careful in the place where you
> put the call to if_tx_rdy() to make sure that it catches the tx queue
> becoming empty only once and not at every polling cycle.

How about at the very end of sis_intr(), as a new "else" branch of the 
queue length check, like this:

 if (ifp->if_snd.ifq_head != NULL)
 sis_start(ifp);
+ 
else
+ 
if_tx_ready(ifp);

That doesn't seem to be in the codepath that gets executed on each poll, 
right?

> p.s. the soekris boxes are becoming popular, aren't they!

They are amazing, I'm really glad the folks on freebsd-small have 
pointed us at them. Only downside is that you go blind if you look at 
the case for too long :-)

Lars
-- 
Lars Eggert <[EMAIL PROTECTED]>   USC Information Sciences Institute



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Dummynet WFQ

2002-06-04 Thread Luigi Rizzo

On Tue, Jun 04, 2002 at 09:22:13AM -0700, Lars Eggert wrote:
> Luigi Rizzo wrote:
> > BTW if you use polling, you have to be careful in the place where you
> > put the call to if_tx_rdy() to make sure that it catches the tx queue
> > becoming empty only once and not at every polling cycle.
> 
> How about at the very end of sis_intr(), as a new "else" branch of the 
> queue length check, like this:

except that sis_intr is never called when you use polling :(

cheers
luigi
>  if (ifp->if_snd.ifq_head != NULL)
>  sis_start(ifp);
> + 
> else
> + 
>   if_tx_ready(ifp);
> 
> That doesn't seem to be in the codepath that gets executed on each poll, 
> right?
> 
> > p.s. the soekris boxes are becoming popular, aren't they!
> 
> They are amazing, I'm really glad the folks on freebsd-small have 
> pointed us at them. Only downside is that you go blind if you look at 
> the case for too long :-)
> 
> Lars
> -- 
> Lars Eggert <[EMAIL PROTECTED]>   USC Information Sciences Institute



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Dummynet WFQ

2002-06-04 Thread Lars Eggert

Luigi Rizzo wrote:
 >>> BTW if you use polling, you have to be careful in the place
 >>> where you put the call to if_tx_rdy() to make sure that it
 >>> catches the tx queue becoming empty only once and not at every
 >>> polling cycle.
 >>
 >> How about at the very end of sis_intr(), as a new "else" branch of
 >> the queue length check, like this:
 >
 > except that sis_intr is never called when you use polling :(

Doh. You're right, of course.

A new "else" branch of the corresponding "if" in sis_poll() would fire 
on each poll while the queue is empty, so I guess I'll put the call at 
the end of the "while" loop in sis_txeof(), after the mbuf is freed.

Lars
-- 
Lars Eggert <[EMAIL PROTECTED]>   USC Information Sciences Institute



smime.p7s
Description: S/MIME Cryptographic Signature


Re: netgraph documentation?

2002-06-04 Thread Archie Cobbs

Lars Eggert writes:
> So I ignore the error for now, and make the TCP tunnel as follows:
> 
> Server:
>   /usr/sbin/ngctl mkpeer iface dummy inet
>   /sbin/ifconfig ng0 10.10.10.1 10.10.10.2
>   /usr/sbin/ngctl mkpeer ng0: ksocket inet inet/stream/tcp
>   /usr/sbin/ngctl msg ng0:inet bind inet/127.0.0.1:50505
>   /usr/sbin/ngctl msg ng0:inet listen 1
>   ngctl: send msg: Operation not supported by device
> 
> Client:
>   /usr/sbin/ngctl mkpeer iface dummy inet
>   /sbin/ifconfig ng1 10.10.10.2 10.10.10.1
>   /usr/sbin/ngctl mkpeer ng1: ksocket inet inet/stream/tcp
>   /usr/sbin/ngctl msg ng1:inet bind inet/127.0.0.1:50506
>   /usr/sbin/ngctl msg ng1:inet connect inet/127.0.0.1:50505
>   ngctl: send msg: Operation now in progress
> 
> A tcpdump on lo0 shows the 3-way handshake suceeding:
> 
> [root@hbo: ~larse] tcpdump -i lo0 port 50505
> tcpdump: listening on lo0
> 08:11:29.013658 loopback.50506 > loopback.50505: S 
> 2787661608:2787661608(0) win 65535  1,nop,nop,timestamp 14010458 0,nop,nop,cc 383> (DF)
> 08:11:29.013710 loopback.50505 > loopback.50506: S 
> 1751674938:1751674938(0) ack 2787661609 win 65535  1,nop,nop,timestamp 14010458 14010458,nop,nop,cc 384,nop,nop,ccecho 383>
> 08:11:29.013754 loopback.50506 > loopback.50505: . ack 1 win 32767 
>  (DF)
> 
> Pinging 10.10.10.2 results in:
> 
> [root@hbo: ~larse] ping 10.10.10.2 
>  PING 10.10.10.2 (10.10.10.2): 56 data bytes
> ping: sendto: Socket is not connected
> ping: sendto: Socket is not connected
> ping: sendto: Socket is not connected
> ^C
> --- 10.10.10.2 ping statistics ---
> 3 packets transmitted, 0 packets received, 100% packet loss

I don't think you can have a point-to-point interface who's
remote IP address is also local to your box. In other words,
this may not work on the same machine but it might work if
you use two different machines... can you try that?

-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: netgraph documentation?

2002-06-04 Thread Lars Eggert

Archie Cobbs wrote:
> I don't think you can have a point-to-point interface who's
> remote IP address is also local to your box. In other words,
> this may not work on the same machine but it might work if
> you use two different machines... can you try that?

The addresses of the point-to-point interface aren't local to the box, 
the encapsulation ones are. I do this all the time with gifs and tuns, 
and it works fine.

Anyway, I tried it with two machines, and I see the same thing happening:

Ping packets originating on the client make it over the TCP tunnel, and 
the server sends something back (ICMP reply, I from the looks of it). 
However, the data gets dropped somewhere after the bpf dumps the packet.

Ping packets originating on the server never enter the tunnel, and I see 
"ping: sendto: Socket is not connected".

A UDP tunnel (like in your example) works fine between the same machines 
using the same addresses.

Please let me know if there's anything I can do to help track this down.

Lars
-- 
Lars Eggert <[EMAIL PROTECTED]>   USC Information Sciences Institute



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Dummynet WFQ

2002-06-04 Thread Luigi Rizzo

On Tue, Jun 04, 2002 at 09:47:22AM -0700, Lars Eggert wrote:
> Luigi Rizzo wrote:
>  >>> BTW if you use polling, you have to be careful in the place
>  >>> where you put the call to if_tx_rdy() to make sure that it
>  >>> catches the tx queue becoming empty only once and not at every
>  >>> polling cycle.
>  >>
>  >> How about at the very end of sis_intr(), as a new "else" branch of
>  >> the queue length check, like this:
>  >
>  > except that sis_intr is never called when you use polling :(
> 
> Doh. You're right, of course.
> 
> A new "else" branch of the corresponding "if" in sis_poll() would fire 
> on each poll while the queue is empty, so I guess I'll put the call at 
> the end of the "while" loop in sis_txeof(), after the mbuf is freed.

and _if_ the mbuf is freed.
In any case it is more a matter of efficiency than of correctness.
Even if you call repeatedly if_tx_rdy() when the device queue is empty
(and the pipe is idle, otherwise at the first occurrence it will
transmit a packet making the transmit queue not empty anymore)
the pipe will not 'accumulate' credits.

cheers
luigi


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: netgraph documentation?

2002-06-04 Thread Archie Cobbs

Lars Eggert writes:
> > I don't think you can have a point-to-point interface who's
> > remote IP address is also local to your box. In other words,
> > this may not work on the same machine but it might work if
> > you use two different machines... can you try that?
> 
> The addresses of the point-to-point interface aren't local to the box, 
> the encapsulation ones are. I do this all the time with gifs and tuns, 
> and it works fine.
> 
> Anyway, I tried it with two machines, and I see the same thing happening:
> 
> Ping packets originating on the client make it over the TCP tunnel, and 
> the server sends something back (ICMP reply, I from the looks of it). 
> However, the data gets dropped somewhere after the bpf dumps the packet.
> 
> Ping packets originating on the server never enter the tunnel, and I see 
> "ping: sendto: Socket is not connected".

Ah yes, now I remember.. the problem is that the listening socket
is not the same socket as the socket for the new connection. E.g.,
notice the way accept(2) works.

What you want to do is not supported in -stable. You can try applying
these patches from -current:

sys/netgraph/ng_ksocket.c   rev. 1.20
sys/netgraph/ng_ksocket.h   rev. 1.5

-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Problem with SYN cache in FreeBSD 4.5

2002-06-04 Thread Nguyen-Tuong Long Le

On Mon, 3 Jun 2002, Mike Silbersack wrote:

> A few questions:
> 
> 1.  Is this 4.5-release, or 4.5-stable (aka 4.6-RC2)?  4.5-release had a
> few bugs in the syn cache which could cause crashes.
> 
> 2.  Are you using accept filters?  Accept filters act oddly on
> 4.5-release, you'll have to upgrade to 4.5-stable/4.6.
> 
> 3.  Could you use tcpdump to determine what exactly is going wrong and
> post a url to the log so that we can investigate what is going wrong?

Hi all,

Thanks for all your suggestions. I've tried out all of them but
they unfortunately didn't fix the problem.

I have this problem with 4.5-RELEASE. I cvsup'ed the source tree and
tried 4.5-RELEASE-p6 and 4.6-RC #1 but they didn't fix the problem.
Setting net.inet.tcp.syncookies didn't help either.
 
I use poll(). I don't use accept filters.

I instrumented some code in tcp_input() that seems to indicate
that lots of TCP segments (about 42000 segments in 10 minutes)
are dropped because syncache_expand() returns 0. This in turn
is caused because syncache_lookup() and syncookie_lookup()
return NULL. Why this is happening is beyond my knowledge.

I took a 10-minute tcpdump trace and put it up at
www.cs.unc.edu/~le/tmp/ti.dump.gz (It's actually two one-way
tcpdump traces taken at the fiber tap next to the server.
I used tcpslice from tcpdump.org to merge them. I checked that all
packets from the two one-way traces are in the merged trace and
they seem to be sorted in timestamp order). Here is a typical
sequence of exchanges for connections that are reset by the server.
The server ack's the first SYN by a SYN/ACK but doesn't ack any
segment after that.

18:11:33.461574 152.2.135.14.1827 > 152.2.136.39.6789: S 171182594:171182594(0) win 
16384  (DF)
18:11:33.760675 152.2.136.39.6789 > 152.2.135.14.1827: S 4246482515:4246482515(0) ack 
171182595 win 16384 
18:11:33.761000 152.2.135.14.1827 > 152.2.136.39.6789: . ack 1 win 17376 
 (DF)
18:11:33.761390 152.2.135.14.1827 > 152.2.136.39.6789: P 1:5(4) ack 1 win  17376 
 (DF)
18:11:33.761586 152.2.135.14.1827 > 152.2.136.39.6789: P 5:13(8) ack 1 win 17376 
 (DF)
18:11:33.762172 152.2.135.14.1827 > 152.2.136.39.6789: P 13:293(280) ack 1 win 17376 
 (DF)
18:11:34.031741 152.2.136.39.6789 > 152.2.135.14.1827: R 4246482516:4246482516(0) win 0
18:11:34.046008 152.2.136.39.6789 > 152.2.135.14.1827: R 4246482516:4246482516(0) win 0
18:11:34.060284 152.2.136.39.6789 > 152.2.135.14.1827: R 4246482516:4246482516(0) win 0

Any hint or suggestion would be very much appreciated.

Thanks,
-- long


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Patch for review: source VIPA

2002-06-04 Thread Marko Zec

Bellow is a patch that enables all outgoing sessions to always use the
same source IP address by default, no matter what outbound interface is
used. If on a multi-homed host the source IP address always originates
from an "always-up" internal virtual interface, than the established TCP
sessions won't break in case of failure of one of the physical
interfaces, or similar serious network topology changes.
The idea is not new, as far as I know it was introduced a couple of
years ago in OS/390 V2R5 TCP/IP implementation, if not earlier. IBM
called this feature "Virtual IP Address - VIPA", so just for fun I also
borrowed the name along with the idea...
Anyway, after applying the patch (against 4.6-RC3 kernel source), a new
sysctl variable net.inet.ip.sourcevipa becomes available, which can be
set to an existing IP address of desired internal interface. Although
any interface can be used for source-VIPA feature, the patch also
provides a new interface type "vipa", which is nothing else than a
standard loopback ifc, but without the loopback flag set, so that it can
be conveniently advertised via the routed daemon. The following line
should be added to your kernel config file in order to make the vipa ifc
available:

pseudo-devicevipa1

Here is an configuration example:

vmbsd# routed
vmbsd# ifconfig
lnc0: flags=8843 mtu 1500
inet 192.168.201.143 netmask 0xff00 broadcast
192.168.201.255
ether 00:50:56:ac:c9:7a
lnc1: flags=8843 mtu 1500
inet 192.168.202.143 netmask 0xff00 broadcast
192.168.202.255
ether 00:50:56:ac:c9:8c
lo0: flags=8049 mtu 16384
inet 127.0.0.1 netmask 0xff00
vipa0: flags=61 mtu 1500
inet 192.168.1.1 netmask 0x
vmbsd# sysctl net.inet.ip.sourcevipa
net.inet.ip.sourcevipa: none
vmbsd# sysctl net.inet.ip.sourcevipa=192.168.1.1
net.inet.ip.sourcevipa: none -> 192.168.1.1
vmbsd#
vmbsd# telnet 192.168.201.10
[cut]
%who am i
markottyp2   Jun  5 01:37   (192.168.1.1)

Have fun!


--- netinet/in_pcb.cThu May  2 04:36:50 2002
+++ netinet/in_pcb.c.vipa   Wed Jun  5 00:14:19 2002
@@ -122,6 +122,50 @@ SYSCTL_PROC(_net_inet_ip_portrange, OID_
 SYSCTL_PROC(_net_inet_ip_portrange, OID_AUTO, hilast, CTLTYPE_INT|CTLFLAG_RW,
   &ipport_hilastauto, 0, &sysctl_net_ipport_check, "I", "");
 
+
+static char sourcevipastr[16] = "none";
+static u_long sourcevipa=0;
+
+static int
+sourcevipa_sysctl(SYSCTL_HANDLER_ARGS)
+{
+   int error;
+   u_char i,j;
+   u_long addr=0;
+   char *c;
+   
+   error = sysctl_handle_string(oidp,
+   oidp->oid_arg1, oidp->oid_arg2, req);
+   if (error)
+   return error;
+   c=sourcevipastr;
+   for (i=0;i<4;i++) {
+   j=0;
+   while (*c>='0'&&*c<='9') {
+   j*=10;
+   j+=*c-'0';
+   c++;
+   }
+   if (*c!=0)
+   c++;
+   addr=(addr<<8)+j;
+   }
+   sourcevipa=addr;
+   if (addr) {
+   unsigned char *ucp=(u_char *)&addr;
+   sprintf(sourcevipastr, "%d.%d.%d.%d",
+   ucp[0], ucp[1], ucp[2], ucp[3]);
+   } else {
+   sprintf(sourcevipastr, "none");
+   }
+   return 0;
+}
+
+SYSCTL_PROC(_net_inet_ip, OID_AUTO, sourcevipa, CTLTYPE_STRING|CTLFLAG_RW,
+   sourcevipastr, sizeof(sourcevipastr), &sourcevipa_sysctl, "A",
+   "Always try to use this IP address as source");
+
+
 /*
  * in_pcb.c: manage the Protocol Control Blocks.
  *
@@ -438,7 +482,25 @@ in_pcbladdr(inp, nam, plocal_sin)
 * to our address on another net goes to loopback).
 */
if (ro->ro_rt && !(ro->ro_rt->rt_ifp->if_flags & IFF_LOOPBACK))
-   ia = ifatoia(ro->ro_rt->rt_ifa);
+   {
+   /*
+* Use source VIPA, if available
+* M. Zec ([EMAIL PROTECTED]) 2002/06/24
+*/
+   if (sourcevipa) {
+   u_long addr = sin->sin_addr.s_addr;
+   u_short fport = sin->sin_port;
+   
+   sin->sin_addr.s_addr = htonl(sourcevipa);
+   sin->sin_port = 0;
+   ia = ifatoia(ifa_ifwithaddr(sintosa(sin)));
+   sin->sin_addr.s_addr = addr;
+   sin->sin_port = fport;
+   }
+
+   if (ia == 0)
+   ia = ifatoia(ro->ro_rt->rt_ifa);
+   }
if (ia == 0) {
u_short fport = sin->sin_port;
 
--- net/if_loop.c   Thu Dec 20 11:30:16 2001
+++ net/if_loop.c.vipa  Wed Jun  5 00:16:40 2002
@@ -38,6 +38,7 @@
  * Loopback interface driver for protocol testing and timing.
  */
 #include "loop.h"
+

Re: Problem with SYN cache in FreeBSD 4.5

2002-06-04 Thread jayanth

Can you dump the output of  netstat -s -p tcp  ?
Checking for listen queue overflows and syncache bucket overflows.

jayanth

Nguyen-Tuong Long Le ([EMAIL PROTECTED]) wrote:
> On Mon, 3 Jun 2002, Mike Silbersack wrote:
> 
> > A few questions:
> > 
> > 1.  Is this 4.5-release, or 4.5-stable (aka 4.6-RC2)?  4.5-release had a
> > few bugs in the syn cache which could cause crashes.
> > 
> > 2.  Are you using accept filters?  Accept filters act oddly on
> > 4.5-release, you'll have to upgrade to 4.5-stable/4.6.
> > 
> > 3.  Could you use tcpdump to determine what exactly is going wrong and
> > post a url to the log so that we can investigate what is going wrong?
> 
> Hi all,
> 
> Thanks for all your suggestions. I've tried out all of them but
> they unfortunately didn't fix the problem.
> 
> I have this problem with 4.5-RELEASE. I cvsup'ed the source tree and
> tried 4.5-RELEASE-p6 and 4.6-RC #1 but they didn't fix the problem.
> Setting net.inet.tcp.syncookies didn't help either.
>  
> I use poll(). I don't use accept filters.
> 
> I instrumented some code in tcp_input() that seems to indicate
> that lots of TCP segments (about 42000 segments in 10 minutes)
> are dropped because syncache_expand() returns 0. This in turn
> is caused because syncache_lookup() and syncookie_lookup()
> return NULL. Why this is happening is beyond my knowledge.
> 
> I took a 10-minute tcpdump trace and put it up at
> www.cs.unc.edu/~le/tmp/ti.dump.gz (It's actually two one-way
> tcpdump traces taken at the fiber tap next to the server.
> I used tcpslice from tcpdump.org to merge them. I checked that all
> packets from the two one-way traces are in the merged trace and
> they seem to be sorted in timestamp order). Here is a typical
> sequence of exchanges for connections that are reset by the server.
> The server ack's the first SYN by a SYN/ACK but doesn't ack any
> segment after that.
> 
> 18:11:33.461574 152.2.135.14.1827 > 152.2.136.39.6789: S 171182594:171182594(0) win 
>16384  (DF)
> 18:11:33.760675 152.2.136.39.6789 > 152.2.135.14.1827: S 4246482515:4246482515(0) 
>ack 171182595 win 16384 
> 18:11:33.761000 152.2.135.14.1827 > 152.2.136.39.6789: . ack 1 win 17376 
> (DF)
> 18:11:33.761390 152.2.135.14.1827 > 152.2.136.39.6789: P 1:5(4) ack 1 win  17376 
> (DF)
> 18:11:33.761586 152.2.135.14.1827 > 152.2.136.39.6789: P 5:13(8) ack 1 win 17376 
> (DF)
> 18:11:33.762172 152.2.135.14.1827 > 152.2.136.39.6789: P 13:293(280) ack 1 win 17376 
> (DF)
> 18:11:34.031741 152.2.136.39.6789 > 152.2.135.14.1827: R 4246482516:4246482516(0) 
>win 0
> 18:11:34.046008 152.2.136.39.6789 > 152.2.135.14.1827: R 4246482516:4246482516(0) 
>win 0
> 18:11:34.060284 152.2.136.39.6789 > 152.2.135.14.1827: R 4246482516:4246482516(0) 
>win 0
> 
> Any hint or suggestion would be very much appreciated.
> 
> Thanks,
> -- long
> 
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-net" in the body of the message
> 
> 

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Problem with SYN cache in FreeBSD 4.5

2002-06-04 Thread Mike Silbersack


On Tue, 4 Jun 2002, jayanth wrote:

> Can you dump the output of  netstat -s -p tcp  ?
> Checking for listen queue overflows and syncache bucket overflows.
>
> jayanth

And "netstat -La" too, please.  I'm interested in if you're accepting
sockets fast enough.

Mike "Silby" Silbersack


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Problem with SYN cache in FreeBSD 4.5

2002-06-04 Thread Nguyen-Tuong Long Le

Hi,

On Tue, 4 Jun 2002, Mike Silbersack wrote:

> 
> On Tue, 4 Jun 2002, jayanth wrote:
> 
> > Can you dump the output of  netstat -s -p tcp  ?
> > Checking for listen queue overflows and syncache bucket overflows.
> >
> > jayanth
> 

Here is the output of "netstat -s -p tcp".

tcp:
264136 packets sent
104124 data packets (127937368 bytes)
286 data packets (344573 bytes) retransmitted
0 resends initiated by MTU discovery
147203 ack-only packets (127 delayed)
0 URG only packets
2 window probe packets
3 window update packets
12518 control packets
747675 packets received
85799 acks (for 127931017 bytes)
3247 duplicate acks
0 acks for unsent data
83750 packets (11689418 bytes) received in-sequence
89 completely duplicate packets (5552 bytes)
0 old duplicate packets
0 packets with some dup. data (0 bytes duped)
2 out-of-order packets (796 bytes)
0 packets (0 bytes) of data after window
0 window probes
30312 window update packets
0 packets received after close
0 discarded for bad checksums
0 discarded for bad header offset fields
0 discarded because packet too short
14 connection requests
17243 connection accepts
372055 bad connection attempts
87501 listen queue overflows
17257 connections established (including accepts)
16695 connections closed (including 1 drop)
830 connections updated cached RTT on close
830 connections updated cached RTT variance on close
178 connections updated cached ssthresh on close
0 embryonic connections dropped
85798 segments updated rtt (of 50455 attempts)
201 retransmit timeouts
0 connections dropped by rexmit timeout
0 persist timeouts
0 connections dropped by persist timeout
0 keepalive timeouts
0 keepalive probes sent
0 connections dropped by keepalive
376 correct ACK header predictions
61279 correct data packet header predictions
104747 syncache entries added
189 retransmitted
285 dupsyn
0 dropped
17243 completed
0 bucket overflow
0 cache overflow
3 reset
0 stale
87501 aborted
0 badack
0 unreach
0 zone failures
0 cookies sent
0 cookies received


> And "netstat -La" too, please.  I'm interested in if you're accepting
> sockets fast enough.

Here is the output of "netstat -La"

Current listen queue sizes (qlen/incqlen/maxqlen)
Proto Listen Local Address 
tcp4  3/0/8192   *.6789 


I wonder why the listen queue overflows when there are so few
connections in the queue. The number of listen queue overflows
is equal to the number of syncache aborts. Is it a coincidence
or are they related?

Thanks,
-- long


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Problem with SYN cache in FreeBSD 4.5

2002-06-04 Thread Mike Silbersack


On Tue, 4 Jun 2002, Nguyen-Tuong Long Le wrote:

> Here is the output of "netstat -La"
>
> Current listen queue sizes (qlen/incqlen/maxqlen)
> Proto Listen Local Address
> tcp4  3/0/8192   *.6789
>
>
> I wonder why the listen queue overflows when there are so few
> connections in the queue. The number of listen queue overflows
> is equal to the number of syncache aborts. Is it a coincidence
> or are they related?
>
> Thanks,
> -- long

It appears that the primary reason a syncache abort would occur is because
the system has run out of sockets.  Is kern.ipc.numopensockets approaching
kern.ipc.maxsockets?

Mike "Silby" Silbersack


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Problem with SYN cache in FreeBSD 4.5

2002-06-04 Thread Nguyen-Tuong Long Le

> It appears that the primary reason a syncache abort would occur is because
> the system has run out of sockets.  Is kern.ipc.numopensockets approaching
> kern.ipc.maxsockets?

Works like a charm. Thanks! I forgot to set this when I upgraded
my system from 4.3 to 4.5 release. My bad. Thanks again!

I couldn't find the variable kern.ipc.numopensockets. Does it exist
in 4.5 or any earlier release?

Thanks again,
-- long


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: netgraph documentation?

2002-06-04 Thread Brian Somers

On Tue, 4 Jun 2002 10:13:17 -0700 (PDT), Archie Cobbs <[EMAIL PROTECTED]> wrote:
[.]
> I don't think you can have a point-to-point interface who's
> remote IP address is also local to your box. In other words,
> this may not work on the same machine but it might work if
> you use two different machines... can you try that?
> 
> -Archie

It's ok to do this.  I run ppp back-to-back with itself using

  set device "!ppp -direct in"

for testing.  In fact, there are examples in ppp.conf.sample :)

-- 
Brian <[EMAIL PROTECTED]>   <[EMAIL PROTECTED]>
  
Don't _EVER_ lose your sense of humour !   

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message