Re: Slipping in the window update

2005-01-10 Thread Don Lewis
On  9 Jan, Mike Silbersack wrote:
> 
> Ok, here's an updated patch for the SYN case.  I've included the patch 
> relative to 6.x, and some text from a tcpdump showing it in action.
> 
> It responds to each SYN with an ACK like the latest tcpsecure document 
> states, but it uses a global counter to rate limit the number of ACKs of 
> this type that it will send to 200 per second.
> 
> I was unable to incorporate the connect idle heuristic I wanted to because 
> right now the incoming spoofed syns would reset the idle counter, which 
> sounds like it could cause a problem somehow... best not to use it for 
> now.  Maybe a future change can clean up that along with the dropafterack 
> case in tcp_input, but that would make this patch far too complex.
> 
> Please take a look at the patch and the abbreviated tcpdump from my test 
> and see if it looks correct.

> + if (thflags & TH_SYN) {
> + if (tp->t_state == TCPS_ESTABLISHED &&
> + tcp_insecure_syn == 0) {


Any good reason for the extra level of nesting?

Testing the SYN flag first is probably optimum, since in normal
operation the vast majority of segments won't have this flag set.

> + if (tp)
> + INP_UNLOCK(inp);

If we've successfully dereferenced tp->t_state, it should not be
necessary to protect INP_UNLOCK() with
if (tp)

> + if (headlocked)
> + INP_INFO_WUNLOCK(&tcbinfo);

I suspect that the headlocked flag is also known at this point in the
code.

Ordinary data segments will have the TH_SYN checked twice.  The first
time in this new code, and the second time after the segment has been
trimmed to fit the window.

/*
 * If a SYN is in the window, then this is an
 * error and we send an RST and drop the connection.
 */
if (thflags & TH_SYN) {
tp = tcp_drop(tp, ECONNRESET);
rstreason = BANDLIM_UNLIMITED;
goto drop;
}

This could make a bit of a performance difference at high speeds, for
instance gigabit Ethernet in a compute cluster.

An alternate fix would be to modify the latter block of code as follows:

if (thflags & TH_SYN) {
+   if (tp->t_state == TCPS_ESTABLISHED &&
+   tcp_insecure_syn == 0)
+   goto dropafterack;
tp = tcp_drop(tp, ECONNRESET);
rstreason = BANDLIM_UNLIMITED;
goto drop;
}

and then after the dropafterack label add the code:

+   if (thflags & TH_SYN) {
+   if (tp->t_state == TCPS_ESTABLISHED &&
+   tcp_insecure_syn == 0) {
+   if (badport_bandlim(BANDLIM_SYN_ESTABLISHED) < 0)
+   goto drop;
+   tcp_respond(tp, mtod(m, void *), th, m, tp->rcv_nxt,
+   tp->snd_una, TH_ACK);
[snip]

I don't think this fix would be complete from the response rate limiting
point of view because this chunk of code in the block that trims to the
left window edge tosses the TH_SYN flag.

todrop = tp->rcv_nxt - th->th_seq;
if (todrop > 0) {
if (thflags & TH_SYN) {
thflags &= ~TH_SYN;
th->th_seq++;
if (th->th_urp > 1)
th->th_urp--;
else
thflags &= ~TH_URG;
todrop--;
}

and this block of code doesn't jump to dropafterack, even in the case
where the entire segment is to the left of the window.  Something else
would have to be done to implement rate limiting for this half of the
sequence space.

Now that I've looked at the above case, it looks to me like your
suggested patch might affect the response to a legitimate duplicate SYN.
It will definitely follow a different code path.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Slipping in the window update

2005-01-10 Thread Don Lewis
After a bit more thinking ...

On 10 Jan, Don Lewis wrote:

> and then after the dropafterack label add the code:
> 
> + if (thflags & TH_SYN) {
> + if (tp->t_state == TCPS_ESTABLISHED &&
> + tcp_insecure_syn == 0) {
> + if (badport_bandlim(BANDLIM_SYN_ESTABLISHED) < 0)
> + goto drop;
> + tcp_respond(tp, mtod(m, void *), th, m, tp->rcv_nxt,
> + tp->snd_una, TH_ACK);
>   [snip]
> 
> I don't think this fix would be complete from the response rate limiting
> point of view because this chunk of code in the block that trims to the
> left window edge tosses the TH_SYN flag.
> 
> todrop = tp->rcv_nxt - th->th_seq;
> if (todrop > 0) {
> if (thflags & TH_SYN) {
> thflags &= ~TH_SYN;
> th->th_seq++;
> if (th->th_urp > 1)
> th->th_urp--;
> else
> thflags &= ~TH_URG;
> todrop--;
> }
> 
> and this block of code doesn't jump to dropafterack, even in the case
> where the entire segment is to the left of the window.  Something else
> would have to be done to implement rate limiting for this half of the
> sequence space.

I think this problem could be solved by a minor addition to the above
block of code.  If the SYN flag is set and the sequence number of the
segment doesn't match the initial received sequence number of the
connection, then we know this is not a duplicate SYN.

todrop = tp->rcv_nxt - th->th_seq;
if (todrop > 0) {
if (thflags & TH_SYN) {
+   if (th->th_seq != tp->irs)
+   goto dropafterack;
thflags &= ~TH_SYN;
th->th_seq++;
if (th->th_urp > 1)
th->th_urp--;
else
thflags &= ~TH_URG;
todrop--;
} 

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Slipping in the window update

2005-01-10 Thread Mike Silbersack
On Mon, 10 Jan 2005, Don Lewis wrote:
Now that I've looked at the above case, it looks to me like your
suggested patch might affect the response to a legitimate duplicate SYN.
It will definitely follow a different code path.
You're right, I neglected to handle the duplicate SYN case.
Couldn't we centralize all SYN handling right after trimthenstep6:?
We could do something there like
if (th->th_seq != tp->irs) {
goto dropafterack; /* Or however we handle these bad syns */
} else {
thflags &= ~TH_SYN;
th->th_seq++;
if (th->th_urp > 1)
th->th_urp--;
else
thflags &= ~TH_URG;
todrop--;
}
And then we could tear out all the two places TH_SYN is mentioned below, 
the place I copied from, and the place where there the tcp_drop() is.

If we made that change, then we'd still be doing only one check for 
TH_SYN, but the code would be a lot easier to comprehend.

Mike "Silby" Silbersack
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Slipping in the window update

2005-01-10 Thread Mike Silbersack
On Mon, 10 Jan 2005, Mike Silbersack wrote:
We could do something there like
if (th->th_seq != tp->irs) {
goto dropafterack; /* Or however we handle these bad syns */
} else {
thflags &= ~TH_SYN;
th->th_seq++;
if (th->th_urp > 1)
th->th_urp--;
else
thflags &= ~TH_URG;
todrop--;
}
Uh, I greatly oversimplified the changes that would be needed there, so 
that implementation would be totally wrong.  I'll go get some sleep and 
then think about the implementation...

Mike "Silby" Silbersack
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Slipping in the window update

2005-01-10 Thread Don Lewis
On 10 Jan, Mike Silbersack wrote:
> 
> On Mon, 10 Jan 2005, Don Lewis wrote:
> 
>> Now that I've looked at the above case, it looks to me like your
>> suggested patch might affect the response to a legitimate duplicate SYN.
>> It will definitely follow a different code path.
> 
> You're right, I neglected to handle the duplicate SYN case.
> 
> Couldn't we centralize all SYN handling right after trimthenstep6:?
> 
> We could do something there like
> 
> if (th->th_seq != tp->irs) {
>   goto dropafterack; /* Or however we handle these bad syns */
> } else {
> thflags &= ~TH_SYN;
> th->th_seq++;
> if (th->th_urp > 1)
>   th->th_urp--;
> else
>   thflags &= ~TH_URG;
> todrop--;
> }

My thinking is that the security problem is confined to the following
block of code:

/*
 * If a SYN is in the window, then this is an
 * error and we send an RST and drop the connection.
 */
if (thflags & TH_SYN) {
tp = tcp_drop(tp, ECONNRESET);
rstreason = BANDLIM_UNLIMITED;
goto drop;
}

and that to implement the recommendation in the Internet Draft, it is
only necessary to change the actions taken inside this "if" block.

If response rate limiting is implemented, I'd actually prefer a more
general solution that is at least somewhat independent of the SYN flag,
since if the goal of an attacker is to cause an flood of ACK responses,
he can just as easily trigger it by sending spoofed packets that don't
have the SYN flag set.  The SYN flag could be used as a hint, but the
solution should be more general.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Current problem reports assigned to you

2005-01-10 Thread FreeBSD bugmaster
Current FreeBSD problem reports
Critical problems
Serious problems

S  Submitted   Tracker Resp.   Description
---
o [2002/07/26] kern/41007  net overfull traffic on third and fourth adap

1 problem total.

Non-critical problems

S  Submitted   Tracker Resp.   Description
---
o [2003/07/11] kern/54383  net [nfs] [patch] NFS root configurations wit

1 problem total.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


buildup of Windows time_wait talking to fbsd 4.10

2005-01-10 Thread Len Conrad
We have a windows mailserver that relays its outbound to a fbsd 
gateway.  We changed to a different fbsd gateway running 4.10. Windows then 
began having trouble sending to 4.10.  Windows "netstat -an" shows  dozens 
of lines like this:

 source IP  desitination IP
==
  TCP10.1.16.3:1403 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1407 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1415 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1419 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1435 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1462 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1470 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1473 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1478 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1493 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1504 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1507 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1508 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1521 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1526 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1546 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1550 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1568 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1571 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1589 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1592 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1616 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1620 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1629 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1644 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1647 192.168.200.59:25  TIME_WAIT
  TCP10.1.16.3:1654 192.168.200.59:25  TIME_WAIT
Eventually, the windows SMTP logs line like "cannot connect to remote IP" 
or "address already in use" because no local tcp/ip sockets are available, 
we think.

The new gateway/fbsd 4.10 "sockstat -4" shows no corresponding tcp 
connections when the Windows server is showing as above.  On the fbsd 4.10 
machines, smtp logs, syslog, and dmesg show no errors.

We switch the windows box to smtp gateway towards the old box/fbsd 4.7, all 
is cool.

Suggestions with how to proceed debugging, please.
I'm trying to get the dmesg.boot for the 4.7 and 4.10 boxes now, sorry.
Len
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: buildup of Windows time_wait talking to fbsd 4.10

2005-01-10 Thread Danny
On Mon, 10 Jan 2005 10:53:39 -0600, Len Conrad <[EMAIL PROTECTED]> wrote:
> 
> We have a windows mailserver that relays its outbound to a fbsd
> gateway.  We changed to a different fbsd gateway running 4.10. Windows then
> began having trouble sending to 4.10.  Windows "netstat -an" shows  dozens
> of lines like this:
> 
>   source IP  desitination IP
> ==
>TCP10.1.16.3:1403 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1407 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1415 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1419 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1435 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1462 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1470 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1473 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1478 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1493 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1504 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1507 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1508 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1521 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1526 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1546 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1550 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1568 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1571 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1589 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1592 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1616 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1620 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1629 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1644 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1647 192.168.200.59:25  TIME_WAIT
>TCP10.1.16.3:1654 192.168.200.59:25  TIME_WAIT
> 
> Eventually, the windows SMTP logs line like "cannot connect to remote IP"
> or "address already in use" because no local tcp/ip sockets are available,
> we think.
> 
> The new gateway/fbsd 4.10 "sockstat -4" shows no corresponding tcp
> connections when the Windows server is showing as above.  On the fbsd 4.10
> machines, smtp logs, syslog, and dmesg show no errors.
> 
> We switch the windows box to smtp gateway towards the old box/fbsd 4.7, all
> is cool.
> 
> Suggestions with how to proceed debugging, please.
> 
> I'm trying to get the dmesg.boot for the 4.7 and 4.10 boxes now, sorry.

What shows up when you run a network sniffer on either machines?

...D
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: buildup of Windows time_wait talking to fbsd 4.10

2005-01-10 Thread laffer1

On Mon, 10 Jan 2005, Len Conrad wrote:
We have a windows mailserver that relays its outbound to a fbsd gateway.  We 
changed to a different fbsd gateway running 4.10. Windows then began having 
trouble sending to 4.10.  Windows "netstat -an" shows  dozens of lines like 
this:

source IP  desitination IP
==
 TCP10.1.16.3:1403 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1407 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1415 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1419 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1435 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1462 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1470 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1473 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1478 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1493 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1504 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1507 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1508 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1521 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1526 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1546 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1550 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1568 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1571 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1589 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1592 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1616 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1620 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1629 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1644 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1647 192.168.200.59:25  TIME_WAIT
 TCP10.1.16.3:1654 192.168.200.59:25  TIME_WAIT
Eventually, the windows SMTP logs line like "cannot connect to remote IP" or 
"address already in use" because no local tcp/ip sockets are available, we 
think.

The new gateway/fbsd 4.10 "sockstat -4" shows no corresponding tcp 
connections when the Windows server is showing as above.  On the fbsd 4.10 
machines, smtp logs, syslog, and dmesg show no errors.

We switch the windows box to smtp gateway towards the old box/fbsd 4.7, all 
is cool.

Suggestions with how to proceed debugging, please.
I'm trying to get the dmesg.boot for the 4.7 and 4.10 boxes now, sorry.
Len
Just off the top of my head...
You mentioned the freebsd machine is the gateway.  Do you have a firewall 
on the host blocking connections from the windows machine?  Do you have a 
different kernel configuration between 4.7 and 4.10?  i.e. do you have 
something like ipdivert, etc in the kernel on one box and not the other? 
Can the windows machine ping the ip 192.168.200.59 as its a different 
class C?

Luke
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: buildup of Windows time_wait talking to fbsd 4.10

2005-01-10 Thread Len Conrad

Just off the top of my head...
You mentioned the freebsd machine is the gateway.  Do you have a firewall 
on the host blocking connections from the windows machine?
a forgotten detail is that the windows machine sends just fine to the 4.10 
gateway for a few minutes, but the time_wait inevitably builds up, so smtp 
access from windows to either gateway is ok.

  Do you have a different kernel configuration between 4.7 and 4.10?
both GENERIC
 i.e. do you have something like ipdivert, etc in the kernel on one box 
and not the other? Can the windows machine ping the ip 192.168.200.59 as 
its a different class C?
sure, basic connectivity is ok.
Len
_
http://IMGate.MEIway.com : free anti-spam gateway, runs on 1000's of sites
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: buildup of Windows time_wait talking to fbsd 4.10

2005-01-10 Thread Giorgos Keramidas
On 2005-01-10 11:26, Len Conrad <[EMAIL PROTECTED]> wrote:
>> Just off the top of my head...
>>
>> You mentioned the freebsd machine is the gateway.  Do you have a
>> firewall on the host blocking connections from the windows machine?
>
> a forgotten detail is that the windows machine sends just fine to
> the 4.10 gateway for a few minutes, but the time_wait inevitably
> builds up, so smtp access from windows to either gateway is ok.
>
>> Do you have a different kernel configuration between 4.7 and 4.10?
>
> both GENERIC
>
>> i.e. do you have something like ipdivert, etc in the kernel on one
>> box and not the other? Can the windows machine ping the ip
>> 192.168.200.59 as its a different class C?
>
> sure, basic connectivity is ok.

Are you using a firewall?  If yes, is it stateful?  Can we see the
ruleset?

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Bug in TCP window update?

2005-01-10 Thread Girish Rayas
In tcp_input.c, window is updated when below condition is true,

if ((thflags & TH_ACK) &&
(SEQ_LT(tp->snd_wl1, th->th_seq) ||
(tp->snd_wl1 == th->th_seq && (SEQ_LT(tp->snd_wl2, th->th_ack) ||
(tp->snd_wl2 == th->th_ack && tiwin > tp->snd_wnd)

This check is to prevent old segments from affecting the send window.
But, left trim logic that was executed earlier in tcp_input.c sets the
th->th_seq to
tp->rcv_nxt for old segments. In many scenarios this effectively causes
snd_wl1 < th_seq and results in incorrect window update by old
segments.

Using actual sequence number of received segment in the above if
statement will fix the problem. Any comments?
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: buildup of Windows time_wait talking to fbsd 4.10

2005-01-10 Thread Lars Erik Gullerud
On Mon, 10 Jan 2005, Len Conrad wrote:
We have a windows mailserver that relays its outbound to a fbsd gateway.  We 
changed to a different fbsd gateway running 4.10. Windows then began having 
trouble sending to 4.10.  Windows "netstat -an" shows  dozens of lines like 
this:

source IP  desitination IP
==
 TCP10.1.16.3:1403 192.168.200.59:25  TIME_WAIT
[snip]
Eventually, the windows SMTP logs line like "cannot connect to remote IP" or 
"address already in use" because no local tcp/ip sockets are available, we 
think.

The new gateway/fbsd 4.10 "sockstat -4" shows no corresponding tcp 
connections when the Windows server is showing as above.  On the fbsd 4.10 
machines, smtp logs, syslog, and dmesg show no errors.

We switch the windows box to smtp gateway towards the old box/fbsd 4.7, all 
is cool.
OK, let me play a wild hunch here - if you look at netstat -na output on 
the 4.7 machine (the one that works) when you are using that one, you see 
a large number of connections in the TIME_WAIT state on that side, while 
none on the Windows-server?

I had a similar situation with an application we use that also opens a 
large number of TCP sessions from a Windows server to a FreeBSD server - 
that suddenly stopped working when the application in question was 
upgraded on the server it connected to. In our case, it turns it it was a 
timing issue that changed on the new version of the application.

When a TCP connection is closing, one side of the connection typically 
initiates the close, and sends a FIN,ACK packet to the other side. After 
going through the steps of closing down the socket, the side that 
initiated the close, will leave the socket in TIME-WAIT state for 2 MSL 
(Maximum Segment Lifetime - which defaults to 2 mins, so 4 min wait) - 
while the other end transitions to CLOSED state (and tears down the 
socket) immediately, without this wait period. (The exception being if 
both ends send FIN,ACK at the same time, in which case they both go to 
TIME-WAIT).

What happened with in our case, on the old version of the application, 
was that as soon as the client started to log off the session, the 
server-side application (on the FreeBSD server) would initiate closing of 
the TCP-session, and thereby being the originator (and getting a large 
number of sessions in TIME-WAIT - which was not a problem for the BSD 
box). While the Windows machine closed it's socket immediately and was 
happy all the time.

However, after we upgraded the application, when the client logged off 
at the application level, the server-side app would first take 2-3 seconds 
to process various shutdown-related activities, and the client end (on 
the Windows machine) got "impatient" and initiated the TCP session close 
from it's side. Leaving all the TIME-WAIT sockets hanging on the Windows 
side, rather than the FreeBSD side.

Now, newer versions of Windows have a ridiculously low number of max 
simultaneous connections configured, and we started seeing exactly the 
same kinds of errors you are describing, due to a large number of 
TIME-WAIT sockets. We had to adjust the server-side application to tear 
down the TCP socket first, THEN do its internal shutdown processing, in 
order to not leave the Windows client in a jam. The alternative was to 
increase the number of simultaneous connections on the Windows machine, 
which involves some registry black magic, and we found this to be the 
easier way out (then - we will probably hack the Windows regkeys if we 
start seeing the issue again).

You didn't mention what MTA you are using, so I don't know if this is a 
similar (application-level) issue, or if it's FreeBSD 4.10 that causes 
some additional delay before initiating a TCP CLOSE, but either way, this 
might be the behaviour you are observing, in which case you will need to 
figure out how to get the FreeBSD side to tear down the connection, or 
preferably you should look at tuning some registry stuff on your 
Windows server - like setting the MSL time (default 2 minutes) to a much 
lower value, and perhaps upping the no. of max simultaneous connections.

HTH,
/leg
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"