On 06.05.2013 12:37, Lawrence Stewart wrote:
[ccing freebsd-net@ so my problem description enters the collective
subconscious in case I forget about this again]

For everyone tuning in, Aris asked me the apt question of why siftr(4)'s
"# inflight bytes" field doesn't take into account sacked bytes.

On 05/06/13 18:45, Angelogiannopoulos, Aris wrote:
On 05/06/13 18:31, Lawrence Stewart wrote:
On 05/06/13 18:18, Angelogiannopoulos, Aris wrote:
I noticed that the sent_inflight_bytes variable doesn’t take
into account the SACKd bytes in case SACK is used.

Is there a specific reason for this?

There is a very good reason - I was too lazy at the time ;)

More specifically, calculating the number of bytes SACKed by the
receiver is more difficult than pulling simple variables out of the
TCP control block and I was new to the TCP stack when I wrote
SIFTR. I've simply never bothered to go back and improve things in
this regard.

isn't it as simple as (tp->snd_nxt - tp->snd_fack) +
tp->sackhint.sack_bytes_rexmit ? This is the way the stack is doing
it in case of more than three duplicate acks.

You would hope so given the comment accompanying that calculation in
tcp_do_segment(), but I'm pretty sure that when I looked into this many
moons ago, I found that calculation to be incorrect. I should obviously
have gotten to the bottom of the problem back then, but from memory it
was not crucial to what I was working on at the time and I subsequently
forgot about it. It's good to come back to this now as it should be fixed.

It's not accurate but also not totally wrong.

Should we change it to better reflect the conditions on the
channel?

Ideally, yes. Feel like having a go and submitting a patch? I can
give you some hints on how to do it if you want a concrete starting
point.

Sure why not?

My write up below is partially from memory and from a 2 min look at the
code just now i.e. parts may be incorrect - please use it as a starting
point only...

So snd_fack effectively acts as a proxy for snd_una while SACK recovery
is happening i.e. it tracks the highest sequence number we've been
cumulatively SACKed for (left edge of the window).

snd_fack tracks the highest SEQ# that has been SACKed to us so we don't
retransmit beyond it during SACK recovery.

sack_bytes_rexmit counts the number of bytes we've retransmitted from
the known holes during the current recovery episode.

Yes, this is to avoid resending already resent data on the next incoming
(S)ACK and to find the next available hole to resend.

snd_max is the highest sequence number we've sent (right edge of the
window).

Yes.

Another important bit of info is that the sender-side sack scoreboard
stores holes (bytes missing at the receiver), whereas the sack blocks
that come in on the wire refer to received byte ranges (bytes
successfully received at the receiver).

Yes.  It's important to avoid confusion here.  We're talking about sender
side SACK (for retransmits we have to send).  There's also, but unrelated
to this discussion, the SACK blocks we send based on the data we received
and is waiting in the reassembly queue.

Therefore from the senders perspective, I believe the "pseudo"
calculation for bytes in flight during a SACK recovery episode should
look something like:

(snd_max - snd_fack) - total_hole_bytes + sack_bytes_rexmit

This isn't accurate either.  It likely over-estimates the amount of data
still in flight with multiple holes while "(snd_max - snd_fack) + 
sack_bytes_rexmit"
likely under-estimates it in the presence of re-ordering.

So instead of total_hole_bytes only those bytes from the holes that are assumed
lost should be counted per the hole retransmission rules.  It would require
some serious work to make our current SACK code track this contiguously. Also
for this reason SACK doesn't / shouldn't immediately retransmit any holes but
wait for more (S)ACKs to arrive (as specified in the applicable RFCs).
I haven't verified that our code does the right thing in all cases.

For reference the graphical representation:

          snd_una                                     snd_max
snd_seq .....|ooo#########ooo#######xxx#######***********|.....
                                    ^        ^
                                 snd_nxt  snd_fack
new_inflight                                  <--------->
hole_bytes    <->    +    <->   +   <->
sack_rexmit   <->    +    <->
total_inflt   <->    +    <->        +        <--------->

 x hole
 o hole retansmitted
 # SACKed
 * new inflight

To play devils advocate likely the best approximation compromise we can do at
the moment is:

inflight = (snd_max - snd_fack) + ((total_hole_bytes - sack_bytes_rexmit) / 2);

3 of those variables are already around and (hopefully) usable, but I
don't believe we currently track a variable equivalent to total_hole_bytes.

Some of them only contain valid information when SACK is active.

Hopefully someone will chime in and correct me if any of the above is
misguided. Otherwise, your challenge, should you choose to accept it, is
to patch SIFTR to perform the above calculation, and to verify that the
reported result is correct - perhaps by running some controlled lab
experiments where you can review the value reported by SIFTR against the
known SACK blocks and sequence numbers on the wire during the recovery
episode.

That would be an excellent verification.

You could choose to make total_hole_bytes a sackhint like
sack_bytes_rexmit, but I suggest you create a first version of the patch
which manually walks the sack scoreboard each time through
siftr_siftdata() to sum the # bytes in each hole to ensure you don't
inadvertently introduce accounting bugs while developing the patch.
Patch v2 can include total_hole_bytes as a sackhint.

Agreed.  There's a limit on the number of SACK holes so in certain situations
it may not be able to accurately reflect all available information.

--
Andre

_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Reply via email to