Hello!

I take my words back. Manfred is right, this requirement is not a MUST.

Real problem is much worse, and it is wholly on the shame of solaris.
Tcpdump shows at least two different bugs there.


  2060  16:31:42.879337 eth0 < dynamic.ih.lucent.com.39406 > static.8664: . 675
80:67580(0) ack 1582261 win 1460 (DF)
  2061  16:31:42.907940 eth0 > static.8664 > dynamic.ih.lucent.com.39406: . 158
3721:1583721(0) ack 67580 win 1460 (DF)

All is OK until now. Solaris's state should be:

SND.NXT=SND.UNA=67580
SND.WND=1460
RCV.NXT=1582261

  2062  16:31:42.908620 eth0 < dynamic.ih.lucent.com.39406 > static.8664: . 675
80:67581(1) ack 1583721 win 0 (DF)

Solaris sends one byte.

SND.NXT++
RCV.NXT=1583721


  2063  16:31:43.098761 eth0 > static.8664 > dynamic.ih.lucent.com.39406: . 158
3721:1583721(0) ack 67581 win 1460 (DF)

We ACK it.

  2064  16:31:43.100993 eth0 < dynamic.ih.lucent.com.39406 > static.8664: P 675
81:68456(875) ack 1583721 win 0 (DF)
  2065  16:31:43.101524 eth0 < dynamic.ih.lucent.com.39406 > static.8664: P 684
56:69041(585) ack 1583721 win 0 (DF)

Solaris sends two segments, filling all the window.

SND.NXT=69041


  2066  16:31:43.108759 eth0 > static.8664 > dynamic.ih.lucent.com.39406: . 158
3720:1583720(0) ack 69041 win 0 (DF)

We send zero window probe. SEG.SEQ=1583720.

Solaris accepts ACK from it!!! (bug #1) But does not accept window.

So, now it thinks that SND.UNA=SND.NXT=69041
                       SND.WND=1460

State is corrupted.

This is hard bug. But it is still not fatal. Actually, such corruptions
(but by different reasons) are common with stacks, which borrowed code
from BSD. Look into tcp-impl, Subj: "Send window update algorithm ..."
They are recoverable, provided stack is sane.


  2067  16:31:43.110623 eth0 < dynamic.ih.lucent.com.39406 > static.8664: P 690
41:69628(587) ack 1583721 win 0 (DF)

Solaris send some crap out of window, because of corrupted state.
No problems.


  2068  16:31:43.110679 eth0 > static.8664 > dynamic.ih.lucent.com.39406: . 158
3721:1583721(0) ack 69041 win 0 (DF)

We tell "No pasaran", of course.

According to rules, Solaris must shrink window now.
This is the only way to recover corrupted state.


  2069  16:31:43.111641 eth0 < dynamic.ih.lucent.com.39406 > static.8664: P 696
28:70501(873) ack 1583721 win 0 (DF)

It does not. And this is point after which recovery is impossible.
Fatal bug#2.


To resume: it is impossible to help to this from Linux side.
We may accept ACK&WIN from out-of-window segments, and this
will help in this case _occasionally_. But  Solaris is still
deemed to lockup randomly with such sawdust in the head.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Reply via email to