Re: kern/122295: [bge] bge Ierr rate increase (since 6.0R) [regression]

2008-06-25 Thread Manuel Kasper
The following reply was made to PR kern/122295; it has been noted by GNATS.

From: "Manuel Kasper" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Subject: Re: kern/122295: [bge] bge Ierr rate increase (since 6.0R) [regression]
Date: Wed, 25 Jun 2008 09:48:29 +0200

 We've been experiencing the same issue with BCM5704 B0 in HP ProLiant
 DL360 G4 servers. The Ierrs are correlated with packet loss (which is
 why we noticed the problem in the first place); however for us, the
 patch in 
 completely fixes the problem and doesn't seem to introduce any problems
 with link state detection (cable disconnect/reconnect, changing link
 speed on remote end etc. all work fine).
 
 Also, OpenBSD already has essentially the same fix (with some dubious
 style changes) in its repository:
 
 
 The problem appears in both FreeBSD 6.3-RELEASE and 7.0-RELEASE. This is
 how things look without the fix (regardless of what link speed is used):
 
 
 Router#ping 192.168.4.1 repeat 1000 size 1500
 
 Type escape sequence to abort.
 Sending 1000, 1500-byte ICMP Echos to 192.168.4.1, timeout is 2 seconds:
 !!
 !!
 !!
 !!
 !!
 !!
 !!
 !!
 !!.!.!.!.!.!.!.!.!.!.!.!.!.!.!.!.!.!!!
 !!
 !!
 !!
 !!
 !!
 
 Success rate is 98 percent (983/1000), round-trip min/avg/max =3D 1/1/4 =
 ms
 
 
 -> Pings from Cisco routers are especially likely to show the issue, as
 apparently mii_tick() and the pings from the Cisco occur synchronously
 for a while. TCP throughput isn't affected very much.
 
 Related dmesg output:
 
 bge0:  mem 0xfdd7-0xfdd7
 irq 25 at device 2.0 on pci2
 miibus1:  on bge0
 brgphy0:  on miibus1
 brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX,
 1000baseTX-FDX, auto
 bge0: Ethernet address: 00:18:71:e4:xx:xx
 
 pciconf -lv:
 
 [EMAIL PROTECTED]:2:0: class=3D0x02 card=3D0x00d00e11 chip=3D0x164814e4 =
 rev=3D0x10
 hdr=3D0x00
 vendor   =3D 'Broadcom Corporation'
 device   =3D 'BCM5704 NetXtreme Dual Gigabit Adapter'
 class=3D network
 subclass =3D ethernet
 
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD NAT-T patch integration

2008-06-25 Thread Scott Ullrich
On Tue, Jun 24, 2008 at 11:54 PM, Norberto Meijome <[EMAIL PROTECTED]> wrote:
> On Tue, 24 Jun 2008 22:01:46 -0500
> mgrooms <[EMAIL PROTECTED]> wrote:
>
>> Is anyone currently looking at the IPsec NAT-T patches? I posted a similar
>> question several months ago around the FAST_IPSEC + IPv6 integration time
>> frame. Maybe now that things have settled a bit, this work can be reviewed
>> and possibly committed?
>
> +1

Both m0n0wall and pfSense also use NAT-T.   It sure would be nice to
have it in FreeBSD so we can discontinue our patching every time we
move to a newer FreeBSD revision.

Scott
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


FreeBSD 7.0: sockets stuck in CLOSED state...

2008-06-25 Thread Ali Niknam

Dear All,

Recently i've been upgrading some of my machines from FreeBSD 6.x amd64 
to FreeBSD 7.0 amd64.


After upgrading I noticed a weird error/bug. It seems that after several 
thousand TCP connections some seem to hang in 'CLOSED' state.


netstat -n gives:
...
tcp4  0   0  1.2.3.4.*  4.5.6.7.42149   CLOSED
tcp4  39  0  1.2.3.4.*  4.5.6.7.54103   CLOSED
tcp4  35  0  1.2.3.4.*  4.5.6.7.41718   CLOSED
tcp4  38  0  1.2.3.4.*  4.5.6.7.55618   CLOSED
tcp4  41  0  1.2.3.4.*  4.5.6.7.44230   CLOSED
tcp4  39  0  1.2.3.4.*  4.5.6.7.49439   CLOSED
...

These never go away; they gradually increase and increase until the 
application starts giving errors (probably because some socket or 
filedescriptor limit is reached). When the application is killed these 
entries disappear.


The application in question is a self written DNS server, multithreaded, 
and running fine for years without any troubles on both BSD 5.x as well 
as 6.x. Also 32bits as well as 64bits on 6.x.


Ofcourse that doesn't mean that the application is error free, however, 
after doing extensive testing I really can not find anything wrong with 
the application itself, so I'm thinking maybe there's a change somewhere 
that causes this? I know that tcp/network has been completely redone...


What basically happens in the application is this:
 - one main tcp thread runs an infinite while loop waiting for new 
connections to arrive
 - as soon as one arrives a new thread is spawned that handles the 
newly created stream

 - it reads some bytes, writes some bytes, then closes it
 - thread exits

What appears to happen is this: after the new thread is spawned it tries 
to read 2 bytes (DNS tcp length information). It gets back 0 bytes (EOF) 
and therefore closes the sockets and calls pthread_exit. However in 
netstat that same stream oftenly appears to have bytes 'stuck' in the in 
queue...


I really can't see how this can cause hanging sockets in 'CLOSED' state. 
Even if the incoming queue isnt read entirely a call to close should 
close it. Also I really can't find any documentation in netstat, or 
elsewhere, about the 'CLOSED' state...



Any help would greatly be appreciated!


Kind Regards,


Ali Niknam
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD NAT-T patch integration

2008-06-25 Thread Julian Elischer

Scott Ullrich wrote:

On Tue, Jun 24, 2008 at 11:54 PM, Norberto Meijome <[EMAIL PROTECTED]> wrote:

On Tue, 24 Jun 2008 22:01:46 -0500
mgrooms <[EMAIL PROTECTED]> wrote:


Is anyone currently looking at the IPsec NAT-T patches? I posted a similar
question several months ago around the FAST_IPSEC + IPv6 integration time
frame. Maybe now that things have settled a bit, this work can be reviewed
and possibly committed?

+1


Both m0n0wall and pfSense also use NAT-T.   It sure would be nice to
have it in FreeBSD so we can discontinue our patching every time we
move to a newer FreeBSD revision.

Scott
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"



where is the patch?

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD NAT-T patch integration

2008-06-25 Thread Scott Ullrich
On Wed, Jun 25, 2008 at 2:36 PM, Julian Elischer <[EMAIL PROTECTED]> wrote:
>
>
> where is the patch?
>
>

The version that we use in RELENG_7_0 is located here:
http://cvs.pfsense.org/cgi-bin/cvsweb.cgi/tools/patches/RELENG_7_0/patch-natt-freebsd7-2008-03-11.diff?rev=1.1;content-type=text%2Fplain

Thanks Julian!

Scott
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD 7.0: sockets stuck in CLOSED state...

2008-06-25 Thread Ali Niknam

   This looks like an issue we used to have at work, where a streaming
application suddenly started getting kevents for sockets that had been
already closed. While that was happening, a netstat output looked just
like yours. We never tracked it down, as we moved to other projects :(




Was that BSD 7.0 also?

--
  Transip BV | http://www.transip.nl/
  We never let you down.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD 7.0: sockets stuck in CLOSED state...

2008-06-25 Thread Vlad GALU
On 6/25/08, Ali Niknam <[EMAIL PROTECTED]> wrote:
> Dear All,
>
>  Recently i've been upgrading some of my machines from FreeBSD 6.x amd64 to
> FreeBSD 7.0 amd64.
>
>  After upgrading I noticed a weird error/bug. It seems that after several
> thousand TCP connections some seem to hang in 'CLOSED' state.
>
>  netstat -n gives:
>  ...
>  tcp4  0   0  1.2.3.4.*  4.5.6.7.42149   CLOSED
>  tcp4  39  0  1.2.3.4.*  4.5.6.7.54103   CLOSED
>  tcp4  35  0  1.2.3.4.*  4.5.6.7.41718   CLOSED
>  tcp4  38  0  1.2.3.4.*  4.5.6.7.55618   CLOSED
>  tcp4  41  0  1.2.3.4.*  4.5.6.7.44230   CLOSED
>  tcp4  39  0  1.2.3.4.*  4.5.6.7.49439   CLOSED
>  ...
>
>  These never go away; they gradually increase and increase until the
> application starts giving errors (probably because some socket or
> filedescriptor limit is reached). When the application is killed these
> entries disappear.
>
>  The application in question is a self written DNS server, multithreaded,
> and running fine for years without any troubles on both BSD 5.x as well as
> 6.x. Also 32bits as well as 64bits on 6.x.
>
>  Ofcourse that doesn't mean that the application is error free, however,
> after doing extensive testing I really can not find anything wrong with the
> application itself, so I'm thinking maybe there's a change somewhere that
> causes this? I know that tcp/network has been completely redone...
>
>  What basically happens in the application is this:
>   - one main tcp thread runs an infinite while loop waiting for new
> connections to arrive
>   - as soon as one arrives a new thread is spawned that handles the newly
> created stream
>   - it reads some bytes, writes some bytes, then closes it
>   - thread exits
>
>  What appears to happen is this: after the new thread is spawned it tries to
> read 2 bytes (DNS tcp length information). It gets back 0 bytes (EOF) and
> therefore closes the sockets and calls pthread_exit. However in netstat that
> same stream oftenly appears to have bytes 'stuck' in the in queue...
>
>  I really can't see how this can cause hanging sockets in 'CLOSED' state.
> Even if the incoming queue isnt read entirely a call to close should close
> it. Also I really can't find any documentation in netstat, or elsewhere,
> about the 'CLOSED' state...
>
>
>  Any help would greatly be appreciated!
>
>
>  Kind Regards,
>
>
>  Ali Niknam
>  ___


   This looks like an issue we used to have at work, where a streaming
application suddenly started getting kevents for sockets that had been
already closed. While that was happening, a netstat output looked just
like yours. We never tracked it down, as we moved to other projects :(


-- 
~/.signature: no such file or directory
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD 7.0: sockets stuck in CLOSED state...

2008-06-25 Thread Vlad GALU
On 6/25/08, Ali Niknam <[EMAIL PROTECTED]> wrote:
> >   This looks like an issue we used to have at work, where a streaming
> > application suddenly started getting kevents for sockets that had been
> > already closed. While that was happening, a netstat output looked just
> > like yours. We never tracked it down, as we moved to other projects :(
> >
> >
> >
>
>  Was that BSD 7.0 also?

   Yes.

>
>  --
>   Transip BV | http://www.transip.nl/
>   We never let you down.
>


-- 
~/.signature: no such file or directory
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD NAT-T patch integration

2008-06-25 Thread Julian Elischer

Scott Ullrich wrote:

On Wed, Jun 25, 2008 at 2:36 PM, Julian Elischer <[EMAIL PROTECTED]> wrote:


where is the patch?




The version that we use in RELENG_7_0 is located here:
http://cvs.pfsense.org/cgi-bin/cvsweb.cgi/tools/patches/RELENG_7_0/patch-natt-freebsd7-2008-03-11.diff?rev=1.1;content-type=text%2Fplain

Thanks Julian!


I don't have time to do a lot of work on it, but if you can get me a 
patch that applies cleanly on -current
and that you have tested, along with testing other cases (e.g. not 
compiled in)

then I can give it a look over and if it looks  ok I can commit it
to -current and then MFC it back to 7 in a week's time or so.
is it ABI compatible?




Scott


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD 7.0: sockets stuck in CLOSED state...

2008-06-25 Thread Robert Watson


On Wed, 25 Jun 2008, Ali Niknam wrote:

Recently i've been upgrading some of my machines from FreeBSD 6.x amd64 to 
FreeBSD 7.0 amd64.


After upgrading I noticed a weird error/bug. It seems that after several 
thousand TCP connections some seem to hang in 'CLOSED' state.


Sounds like there's a bug somewhere.  Before we start trying to track it down, 
I'll tell you a little more about how this works so that we can interpret the 
output you're seeing.


In FreeBSD, as with all UNIX/Berkeley sockets systems, each socket is actually 
represented by a set of data structures representing different layers of 
abstraction.  At the top level of struct file, representing a file descriptor. 
Next down is struct socket, representing a socket.  Then the protocol code has 
struct inpcb, representing a generic IP connection, and struct tcpcb (or 
struct tcptw once we enter TIMEWAIT), representing a TCP connection. 
Confusingly, these data structures don't always exist all at once.  For 
example, if you close the file descriptor, freeing struct file, the socket and 
protocol state may persist for some time until the TCP connection closes (all 
data has been sent, or various other close modes).


One important difference between FreeBSD 6.x and FreeBSD 7.x is that, in 
FreeBSD 7.x, we've reduced the degree to which these data structures exist in 
isolation.  If you look at the mailing list threads discussing the change, 
you'll see it described as "strengthening invariants".  The most important 
part of the change was making it an invariant that so->so_pcb, the pointer 
from the socket to the protocol layer state, always remains stable and valid. 
This had a number of benefits: because the pointer is always stable, it no 
longer requires locks to following, lowering overhead and improving 
parallelism.  It also simplifies the code by removing lots of error handling, 
and improved code stability by avoiding the inevitable bugs associated with 
complex error handling.  If you look at bug reports over the years, we've had 
quite a few panics reported (and fixed) in which the disappearance of protocol 
layer state, such as when a connection is reset while still in use by a 
process, and these are now all believed to be eliminated.


So the code is faster, cleaner, and more stable.  But there are a few 
interesting side effects.  One is that we retain state at the TCP layer for 
longer than we used to.  Specifically, if a TCP connection closes, the inpcb 
remains allocated until the file descriptor is closed (i.e., the application 
notices the connection has closed and invokes close() on the file descriptor). 
This has a few impacts: one is that TCP connections now appear in netstat in 
the CLOSED state for longer than before, and another is that open sockets that 
are associated with CLOSED TCP connections now count against the global 
resource limit on the number of simultaneous TCP connections.


I say "longer than before", but I should be clear that, in practice, assuming 
all is working properly, there's no measurable behavioral change *except* for 
improved performance, cleanliness, and stability.  This is because 
applications generally open a socket, run a protocol, and when the protocol 
wraps up, they then close() the file descriptor in order to close the 
connection.


So, with that introduction, we're interested in resolving:

(1) Is this an application bug (leaking file descriptors) that only manifests
in 7.x due to changes in kernel state management, leading to the sockets
being visible in netstat and counting against the resource limit?

(2) Is this a *new* bug in TCP in 7.x, perhaps a result of the state-related
changes I've described?

(3) Is this an *old* bug in TCP that is only now manifesting because of the
changes in kernel state management?

The first is the easiest to resolve, as all we need to do is see whether the 
number of file descriptors for the application goes upwards in an improbable 
manner.  You can use fstat, procstat, sockstat, or various other tools (such 
as lsof) to see whether the process is leaking file descriptors.  You can also 
instrument your application to keep track of the file descriptor numbers being 
returned to see whether, perhaps, that number only goes up over time, and gets 
really big.


If it turns out that your application *is* properly closing sockets, then we 
need to decide if perhaps we're looking at a race in close and state 
management.  In particular, I'll need the output of "netstat -na", "vmstat 
-z", and "vmstat -m" from the machine once it's in its rather wedged-up state. 
It would be most helpful if you could actually shut down to single-user mode, 
killing all user processes, then waiting ten minutes, and capturing the output 
of those above commands to files that you can then e-mail to me.


Without accusing you of having buggy code, I should say that I think there's a 
reasonable chance that what you're seeing is an interaction between a

Re: FreeBSD NAT-T patch integration

2008-06-25 Thread Scott Ullrich
On Wed, Jun 25, 2008 at 3:33 PM, Yuri Lukin <[EMAIL PROTECTED]> wrote:
> I believe the original author of the patch has one that should work with 
> current:
>
> http://vanhu.free.fr/FreeBSD/

Even better.

Looks like http://vanhu.free.fr/FreeBSD/patch-natt-freebsd-HEAD-2008-03-19.diff
might be semi up to date.

Thanks a million for assisting Julian, this is going to make a lot of
folks happy! :)

Scott
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD NAT-T patch integration

2008-06-25 Thread Eric Masson
Julian Elischer <[EMAIL PROTECTED]> writes:

Hi,

> where is the patch?

It seems that the last patch to -current is available here :
http://vanhu.free.fr/FreeBSD/patch-natt-freebsd-HEAD-2008-03-19.diff

Maybe Yvan has a more recent patch available (CCed)

-- 
 Ce ne sont que des propositions. Je ne veux pas les faire passer en
 force. Je pense que si mes idées doivent être reprises, elles ne
 doivent pas passer au vote, pour plusieurs raison :
 -+- BC in : http://neuneu.ctw.cc - Neuneu sans vote et sans forcer -+-
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD NAT-T patch integration

2008-06-25 Thread Bjoern A. Zeeb

On Wed, 25 Jun 2008, Julian Elischer wrote:

Hi Julian,

I don't have time to do a lot of work on it, but if you can get me a patch 
that applies cleanly on -current
and that you have tested, along with testing other cases (e.g. not compiled 
in)

then I can give it a look over and if it looks  ok I can commit it
to -current and then MFC it back to 7 in a week's time or so.


if it would be that easy, it would have happened 2 years ago.

--
Bjoern A. Zeeb  Stop bit received. Insert coin for new game.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD NAT-T patch integration

2008-06-25 Thread Scott Ullrich
On Wed, Jun 25, 2008 at 3:53 PM, Bjoern A. Zeeb
<[EMAIL PROTECTED]> wrote:
> if it would be that easy, it would have happened 2 years ago.

What can we as a community do to assist in making this easier and doable?

Scott
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD NAT-T patch integration

2008-06-25 Thread Yuri Lukin
On Wed, 25 Jun 2008 12:34:56 -0700, Julian Elischer wrote
> Scott Ullrich wrote:
> > On Wed, Jun 25, 2008 at 2:36 PM, Julian Elischer <[EMAIL PROTECTED]> wrote:
> >>
> >> where is the patch?
> >>
> >>
> > 
> > The version that we use in RELENG_7_0 is located here:
> >
http://cvs.pfsense.org/cgi-bin/cvsweb.cgi/tools/patches/RELENG_7_0/patch-natt-freebsd7-2008-03-11.diff?rev=1.1;content-type=text%2Fplain
> > 
> > Thanks Julian!
> 
> I don't have time to do a lot of work on it, but if you can get me a 
> patch that applies cleanly on -current
> and that you have tested, along with testing other cases (e.g. not 
> compiled in)
> then I can give it a look over and if it looks  ok I can commit it
> to -current and then MFC it back to 7 in a week's time or so.
> is it ABI compatible?
> 
> > 
> > Scott
> 

I believe the original author of the patch has one that should work with 
current:

http://vanhu.free.fr/FreeBSD/


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD NAT-T patch integration

2008-06-25 Thread Julian Elischer

Scott Ullrich wrote:

On Wed, Jun 25, 2008 at 3:33 PM, Yuri Lukin <[EMAIL PROTECTED]> wrote:

I believe the original author of the patch has one that should work with 
current:

http://vanhu.free.fr/FreeBSD/


Even better.

Looks like http://vanhu.free.fr/FreeBSD/patch-natt-freebsd-HEAD-2008-03-19.diff
might be semi up to date.

Thanks a million for assisting Julian, this is going to make a lot of
folks happy! :)


do you have the ability to test this?



Scott


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD NAT-T patch integration

2008-06-25 Thread Scott Ullrich
On Wed, Jun 25, 2008 at 4:24 PM, Julian Elischer <[EMAIL PROTECTED]> wrote:
> do you have the ability to test this?

Absolutely.   Is this the only thing from preventing it being merged into HEAD?

Scott
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD NAT-T patch integration

2008-06-25 Thread Brooks Davis
On Wed, Jun 25, 2008 at 04:30:36PM -0400, Scott Ullrich wrote:
> On Wed, Jun 25, 2008 at 4:24 PM, Julian Elischer <[EMAIL PROTECTED]> wrote:
> > do you have the ability to test this?
> 
> Absolutely.   Is this the only thing from preventing it being merged into 
> HEAD?

No.  It's a large and complex patch an a subsystem (ipsec) that must not
be broken.  We're a bit shorthanded in this area, but people have been
working on this for quite some time and IIRC aren't fully comfortable
with the patch yet.

-- Brooks


pgpMrbDJxjxiL.pgp
Description: PGP signature


SOLVED (was Re: Problem clarification (was: Problems with vlan + carp + alias))

2008-06-25 Thread Giulio Ferro
I finally got the problem, and it had nothing to do either with vlans or 
with carp.


The firewall I was setting up was meant to replace an existing freebsd 
firewall

which didn't use vlans (it had a lot of nics).
The problem was that the network port where our ISP brings the internet 
connection
still had the old aliased mac addresses in its arp cache. For some 
reason when I
plugged in the new firewall, only the base non-aliased address was 
updated in
the ISP switch arp cache (if someone can throw a guess at why, I'm eager 
to listen).
The ISP router was still looking for the aliased addresses with the old 
macs, so it
didn't find them. Moreover, I inadvertently put the vlan internet 
interface in
promiscuous mode, so with tcpdump I also picked up those packets with 
wrong mac

address which weren't meant for me.

To make the story short, I called the technical customer care of the ISP 
and I
requested them to reset the arp cache of the port. Done that, everything 
worked

without a glitch.

The new firewall is now up and running in production with vlan + carp. 
Everything

seems fine.
Thanks to everybody who answered my plea... :-)


Giulio Ferro wrote:

After some more tests I've finally realized that the problem is with
vlan and alias. I've taken carp out of the picture.


(Please read my previous message on the topic to understand the scenario,
I've reported it below)

Here is what matters in /etc/rc.conf:

---
...
ifconfig_bce0="inet 192.168.26.1 netmask 255.255.255.0"
...
ifconfig_vlan128="inet x.y.z.132 netmask 255.255.255.224 vlan 128 
vlandev bce0"

ifconfig_vlan128_alias0="x.y.z.133 netmask 255.255.255.255"
ifconfig_vlan128_alias1="x.y.z.134 netmask 255.255.255.255"
ifconfig_vlan128_alias2="x.y.z.135 netmask 255.255.255.255"
ifconfig_vlan128_alias3="x.y.z.136 netmask 255.255.255.255"
ifconfig_vlan128_alias4="x.y.z.137 netmask 255.255.255.255"
ifconfig_vlan128_alias5="x.y.z.138 netmask 255.255.255.255"
ifconfig_vlan128_alias6="x.y.z.139 netmask 255.255.255.255"
ifconfig_vlan128_alias7="x.y.z.140 netmask 255.255.255.255"
ifconfig_vlan128_alias8="x.y.z.141 netmask 255.255.255.255"
...
defaultrouter="x.y.z.129"
---

netstat -rn
---
defaultx.y.z.129UGS 0 9869 vlan12
x.y.z.128/27 link#11UC  00 vlan12
x.y.z.12900:00:0c:07:ac:0a  UHLW2   52 vlan12   1107
x.y.z.13000:d0:03:8a:9b:fc  UHLW10 vlan12   1147
x.y.z.13100:d0:03:8a:9b:fd  UHLW10 vlan12   1144
x.y.z.133/32 link#11UC  00 vlan12
x.y.z.134/32 link#11UC  00 vlan12
x.y.z.135/32 link#11UC  00 vlan12
x.y.z.136/32 link#11UC  00 vlan12
x.y.z.137/32 link#11UC  00 vlan12
x.y.z.138/32 link#11UC  00 vlan12
x.y.z.139/32 link#11UC  00 vlan12
x.y.z.140/32 link#11UC  00 vlan12
x.y.z.141/32 link#11UC  00 vlan12
---

ifconfig vlan128
---
vlan128: flags=8843 metric 0 
mtu 1500

   options=3
   ether 00:1e:c9:ad:fa:c9
   inet x.y.z.132 netmask 0xffe0 broadcast x.y.z.159
   inet x.y.z.133 netmask 0x broadcast x.y.z.133
   inet x.y.z.134 netmask 0x broadcast x.y.z.134
   inet x.y.z.135 netmask 0x broadcast x.y.z.135
   inet x.y.z.136 netmask 0x broadcast x.y.z.136
   inet x.y.z.137 netmask 0x broadcast x.y.z.137
   inet x.y.z.138 netmask 0x broadcast x.y.z.138
   inet x.y.z.139 netmask 0x broadcast x.y.z.139
   inet x.y.z.140 netmask 0x broadcast x.y.z.140
   inet x.y.z.141 netmask 0x broadcast x.y.z.141
   media: Ethernet autoselect (1000baseTX )
   status: active
   vlan: 128 parent interface: bce0
---

Tests:
No problem when I try to ping the default gateway from my fw
No problem when I ping my fw from an external internet address

Problems:
- I cannot ping the router from one of the aliased address:
   ping -S x.y.z.133 x.y.z.129
- I cannot ping the aliased addresses from an external internet address

Note : I can see the packets with tcpdump travelling from and to the 
aliased

address. It seems the interface won't process them for some reason.

This seems suspiciously like a bug to me...


-- 


(previous message on vlan + carp +alias)
-- 




Primeroz lists wrote:
What is t

Re: FreeBSD 7.0: sockets stuck in CLOSED state...

2008-06-25 Thread Ali Niknam

Hi Robert,

Sounds like there's a bug somewhere.  Before we start trying to track it 

[...]

So, with that introduction, we're interested in resolving:



Quite comprehensive indeed; thank you for all that information. I was 
not aware that there was a decoupling between the various parts of the 
abstractions, but now that I think of it, it's more or less logical I guess.


The first is the easiest to resolve, as all we need to do is see whether 

[...]
the file descriptor numbers being returned to see whether, perhaps, that 
number only goes up over time, and gets really big.




My personal feeling is that it's a race condition; no idea why, but it 
feels that way. Maybe because it's such a small number as compared to 
the big amount of connections that takes place.


I do not leak file descriptors as far as I can see, I can send you the 
information you ask for (netstat, sockstat, fstat, etc.) offlist if you 
like, or if you prefer, I can give you access to the machine, please let 
me know whichever you like.


I'd like to reiterate that at this moment i'm not sure at all if it's my 
code, or kernel code. However I've seen, for my feeling, sufficient 
information to reasonably suspect that it _might_ be something outside 
my code :).


wedged-up state. It would be most helpful if you could actually shut 
down to single-user mode, killing all user processes, then waiting ten 
minutes, and capturing the output of those above commands to files that 
you can then e-mail to me.




Because it's a live machine that would be very difficult. Maybe, if you 
really really need it that way and we can't find another way I can 
announce maintainance and do it in the middle of the night :).


Without accusing you of having buggy code, I should say that I think 
there's a reasonable chance that what you're seeing is an interaction 
between an existing leak of resources in the application and the way the 
kernel state management has changed.  The output from netstat pretty 


Yes that was the first thing I though of as well, however, especially 
one of the two applications is so simple that I would be ashamed to 
death if I still had a bug in there :). If it turns out that way: 
 ;).


precisely matches that what you'd expect: lots of TCP connections in the 
CLOSED state reflecting a series of connections built by an application 
but then not properly discarded. Likewise, when the application is 
killed, all of the connections go away -- most likely because the file 
descriptors are all closed, allowing them to be garbage collected and 
connection state freed.  If it is this sort of bug, then most likely 
you're missing a call to close() in a work loop somewhere, and in some 
exceptional case, you fall out of the loop without calling close().




I will double check this once more, but honestly, i strongly doubt it...

Also one other thing that I've noticed, is that it's always the input 
buffer that has bytes left; never the output buffer...


Moreover, i've seen that close() reports EBADF, but due to the insane 
amount of connections I can not say for certain that that's when the 
connection goes into CLOSED state. The ip's do match, but it's very 
common for the same ip's to make numerous connections too.


Kind Regards,

Ali


--
  Transip BV | http://www.transip.nl/
  We never let you down.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD NAT-T patch integration

2008-06-25 Thread Julian Elischer

Bjoern A. Zeeb wrote:

On Wed, 25 Jun 2008, Julian Elischer wrote:

Hi Julian,

I don't have time to do a lot of work on it, but if you can get me a 
patch that applies cleanly on -current
and that you have tested, along with testing other cases (e.g. not 
compiled in)

then I can give it a look over and if it looks  ok I can commit it
to -current and then MFC it back to 7 in a week's time or so.


if it would be that easy, it would have happened 2 years ago.




I don't see anything in there that would stop a commit.

Please be more specific?

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD NAT-T patch integration

2008-06-25 Thread mgrooms

On Wed, 25 Jun 2008 12:34:56 -0700, Julian Elischer <[EMAIL PROTECTED]>
wrote:
> Scott Ullrich wrote:
>> On Wed, Jun 25, 2008 at 2:36 PM, Julian Elischer <[EMAIL PROTECTED]>
> wrote:
>>>
>>> where is the patch?
>>>
>>>
>>
>> The version that we use in RELENG_7_0 is located here:
>>
>
http://cvs.pfsense.org/cgi-bin/cvsweb.cgi/tools/patches/RELENG_7_0/patch-natt-freebsd7-2008-03-11.diff?rev=1.1;content-type=text%2Fplain
>>
>> Thanks Julian!
> 
> I don't have time to do a lot of work on it, but if you can get me a
> patch that applies cleanly on -current
> and that you have tested, along with testing other cases (e.g. not
> compiled in)
> then I can give it a look over and if it looks  ok I can commit it
> to -current and then MFC it back to 7 in a week's time or so.
> is it ABI compatible?
> 

Julian,

To my knowledge, here are the latest patch sets  ...

http://vanhu.free.fr/FreeBSD/patch-natt-freebsd6-2007-05-31.diff
http://vanhu.free.fr/FreeBSD/patch-natt-freebsd7-2008-03-11.diff
http://vanhu.free.fr/FreeBSD/patch-natt-freebsd-HEAD-2008-03-19.diff

Thanks,

-Matthew

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD NAT-T patch integration

2008-06-25 Thread Julian Elischer

Scott Ullrich wrote:

On Wed, Jun 25, 2008 at 3:53 PM, Bjoern A. Zeeb
<[EMAIL PROTECTED]> wrote:

if it would be that easy, it would have happened 2 years ago.


What can we as a community do to assist in making this easier and doable?


that is the question..

NAT-T is a very useful feature, and not having it s a strike agains 
FreeBSD in a lot of eyes.. (It's needed for example to

talk to ASA firewalls for VPN stuff I believe).

Anyhow I looked at the patch briefly and haven't seen anything
horrendously terrible in it..
I'll look a bit more later but it doesn't seem to interfere
with unrelated code that I can see.







Scott
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Why isn't ALTQ in GENERIC?

2008-06-25 Thread Erik Osterholm
On Wed, Jun 25, 2008 at 03:13:54AM +0200, Max Laier wrote:
> Hi Erik,
> 
> Am Di, 24.06.2008, 23:26, schrieb Erik Osterholm:
> > Can anyone tell me if there are good reasons for explicitly leaving
> > ALTQ out of the kernel?  More specific to my circumstances, if I'm
> > building kernels to be installed on every machine we deploy, is it
> > worth building a separate kernel for ALTQ for those few boxes which
> > will require it?
> >
> > Are there performance issues?  Stability issues?  Ultimately, I'm just
> > surprised that it's not available in GENERIC if there isn't a good
> > reason, but I can't find any documentation for that reason.
> 
> Short answer: Historical reasons.
> 
> Whole stroy: When ALTQ was added there were both performance and stability
> concerns.  For a long time we had a big #ifdef ALTQ in if_var.h to avoid
> one additional check for if_queue enqueue opperations.  These are now gone
> and I personally don't see any issues that would prevent ALTQ from being
> in GENERIC.  However, it's unclear which disceplines to turn on by
> default.  I'd like to see ALTQ in GERNERIC, but I've been reluctant to
> make the change on my own.  If we can get a quorum here, I'll reconsider
> it.
 

Thanks for the explanation.  I think that it would be nice to have in
GENERIC, but my immediate concerns were for with the performance and
stability.  

Thanks!

Erik
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD 7.0: sockets stuck in CLOSED state...

2008-06-25 Thread Lawrence Stewart

Ali Niknam wrote:

Hi Robert,


[snip]



I will double check this once more, but honestly, i strongly doubt it...

Also one other thing that I've noticed, is that it's always the input 
buffer that has bytes left; never the output buffer...


Moreover, i've seen that close() reports EBADF, but due to the insane 
amount of connections I can not say for certain that that's when the 
connection goes into CLOSED state. The ip's do match, but it's very 
common for the same ip's to make numerous connections too.




To get a bit more detail about the state of the tcb and socket buffers 
at the time the connection is shut down, you can use my SIFTR tool, 
available from:


http://caia.swin.edu.au/urp/newtcp/tools.html

The readme should explain how to use it. Please keep the "ppl" sysctl at 
1. Once you have some data collected for tcbs you know end up in the 
unexpected CLOSED state, have a look at the relevant fields in the SIFTR 
log file and let us know what you find. Might be useful if you send the 
log file through as well for me to have a quick look.


Cheers,
Lawrence
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: SOLVED (was Re: Problem clarification (was: Problems with vlan + carp + alias))

2008-06-25 Thread Steve Bertrand

Giulio Ferro wrote:
I finally got the problem, and it had nothing to do either with vlans or 
with carp.


The firewall I was setting up was meant to replace an existing freebsd 
firewall

which didn't use vlans (it had a lot of nics).
The problem was that the network port where our ISP brings the internet 
connection

still had the old aliased mac addresses in its arp cache.


Thank you Giulio (is it Gio?)... for replying everyone with a definitive 
conclusion. Thats fantastic for the followers of the thread, but the 
archives as well.


For some 
reason when I
plugged in the new firewall, only the base non-aliased address was 
updated in
the ISP switch arp cache (if someone can throw a guess at why, I'm eager 
to listen).


Well, you need to know what type of switch they had upstream, and why 
they weren't updating their ARP cache dynamically properly. Perhaps 
because their cache ttl was too long (due to the type of hardware, or 
administrative setting).


I almost have to assume it wasn't a Cisco... only because I would have 
expected different behavior (less administrative setting) (this is my 
personal experience...I'm not trying to favour a brand in any way).


Perhaps you could ask them to provide the command they issued to 
determine how they found the problem. Better yet, ask what type of 
device your box is connected to at their end of the VLAN.


If you can find out what device they have at their end, it may almost be 
possible to non-destructively, and non-corruptively 'force' them to 
clear arp-cache remotely, and at the same time provide advice to the 
non-unscrupulous people who may run into this in the future.


I'd be just as interested to know what they had at their end for 
hardware, as I have been waiting to hear what your resolution was 
throughout your time consuming troubleshooting...


Steve
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"