Re: [Cerowrt-devel] SQM and PPPoE, more questions than answers...

Sebastian Moeller Thu, 19 Mar 2015 01:30:42 -0700

Hi Alan,


On Mar 18, 2015, at 23:14 , Alan Jenkins <alan.christopher.jenk...@gmail.com> 
wrote:

> Hi Seb
> 
> I tested shaping on eth1 vs pppoe-wan, as it applies to ADSL.  (On Barrier 
> Breaker + sqm-scripts).  Maybe this is going back a bit & no longer 
> interesting to read.  But it seemed suspicious & interesting enough that I 
> wanted to test it.
> 
> My conclusion was 1) I should stick with pppoe-wan,

        Not a bad decision, especially given the recent changes to SQM to make 
it survive transient pppoe-interface disappearances. Before those changes the 
beauty of shaping on the ethernet device was that pppoe could come and go, but 
SQM stayed active and working. But due to your help this problem seems fixed 
now.

> 2) the question really means do you want to disable classification
> 3) I personally want to preserve the upload bandwidth and accept slightly 
> higher latency.

        My question still is, is the bandwidth sacrifice really necessary or is 
this test just showing a corner case in simple.qos that can be fixed. I 
currently lack enough time to tackle this effectively.

> 
> 
> On 15/10/14 01:03, Sebastian Moeller wrote:
>> Hi All,
>> 
>> some more testing: On Oct 12, 2014, at 01:12 , Sebastian Moeller
>> <moell...@gmx.de> wrote:
> 
>>> 1) SQM on ge00 does not show a working egress classification in the
>>> RRUL test (no visible “banding”/stratification of the 4 different
>>> priority TCP flows), while SQM on pppoe-ge00 does show this
>>> stratification.
> 
>> Usind tc filters u32 filter makes it possible to actually dive into
>> PPPoE encapsulated ipv4 and ipv6 packets and perform classification
>> on “pass-through” PPPoE packets (as encountered when starting SQM on
>> ge00 instead of pppoe-ge00, if the latter actually handles the wan
>> connection), so that one is solved (but see below).
>> 
>>> 
>>> 2) SQM on ge00 shows better latency under load (LUL), the LUL
>>> increases for ~2*fq_codels target so 10ms, while SQM on pppeo-ge00
>>> shows a LUL-increase (LULI) roughly twice as large or around 20ms.
>>> 
>>> I have no idea why that is, if anybody has an idea please chime
>>> in.
> 
> I saw the same, though with higher difference for egress rate.  See first 
> three files here:
> 
> https://www.dropbox.com/sh/shwz0l7j4syp2ea/AAAxrhDkJ3TTy_Mq5KiFF3u2a?dl=0
> 
> [netperf-wrapper noob puzzle: most of the ping lines vanish part-way through. 
>  Maybe I failed it somehow.]

        This is not your fault, the UDP probes net-perf wrapper uses do not 
accept packet loss, once a packet (I believe) is lost the stream stops. This is 
not ideal, but it gives a good quick indicator of packet loss for sparse 
streams ;)

> 
>> Once SQM on ge00 actually dives into the PPPoE packets and
>> applies/tests u32 filters the LUL increases to be almost identical to
>> pppoe-ge00’s if both ingress and egress classification are active and
>> do work. So it looks like the u32 filters I naively set up are quite
>> costly. Maybe there is a better way to set these up...
> 
> Later you mentioned testing for coupling with egress rate.  But you didn't 
> test coupling with classification!

        True, I was interesting in getting the 3-tier shaper to behave sanely, 
so I did not look at the 1-tier simplest.qos.

> 
> I switched from simple.qos to simplest.qos, and that achieved the lower 
> latency on pppoe-wan.  So I think your naive u32 filter setup wasn't the real 
> problem.

        Erm, but simplest.qos is not using the relevant tc filters, so the 
these could still account for the issue; that or some loss due to the 3 htb 
shapers...
> 
> I did think ECN wouldn't be applied on eth1, and that would be the cause of 
> the latency.  But disabling ECN didn't affect it.  See files 3 to 6:
> 
> https://www.dropbox.com/sh/shwz0l7j4syp2ea/AAAxrhDkJ3TTy_Mq5KiFF3u2a?dl=0

        We typically only enable ECN on the downlink so far (under the 
assumption that this is a faster congestion signal to the receiver than 
dropping the packet and then having to wait for the next packet to create 
dupACKs; typically the router is close to the end-hosts and the packets already 
cleared the real bottleneck, so dropping them is not going to help the 
effective bandwidth use); on the uplink the reasoning reverses, here dropping 
instead of marking saves bandwidth for other packets (also often uplink 
bandwidth is more precious) and the packets basically just started their 
journey so the control loop still can take a long time to complete and other 
hops can drop the packet. (I guess my current link is fast enough to activate 
ECN on uplink as well to see how that behaves, so I will try that for a bit...)

> 
> I also admit surprise at fq_codel working within 20%/10ms on eth1.  I thought 
> it'd really hurt, by breaking the FQ part.  Now I guess it doesn't.  I still 
> wonder about ECN marking, though I didn't check my endpoint is using ECN.
> 
>>> 
>>> 3) SQM on pppoe-ge00 has a rough 20% higher egress rate than SQM on
>>> ge00 (with ingress more or less identical between the two). Also 2)
>>> and 3) do not seem to be coupled, artificially reducing the egress
>>> rate on pppoe-ge00 to yield the same egress rate as seen on ge00
>>> does not reduce the LULI to the ge00 typical 10ms, but it stays at
>>> 20ms.
>>> 
>>> For this I also have no good hypothesis, any ideas?
>> 
>> With classification fixed the difference in egress rate shrinks to
>> ~10% instead of 20, so this partly seems related to the
>> classification issue as well.
> 
> My tests look like simplest.qos gives a lower egress rate, but not as low as 
> eth1.  (Like 20% vs 40%).  So that's also similar.
> 
>>> So the current choice is either to accept a noticeable increase in
>>> LULI (but note some years ago even an average of 20ms most likely
>>> was rare in the real life) or a equally noticeable decrease in
>>> egress bandwidth…
>> 
>> I guess it is back to the drawing board to figure out how to speed up
>> the classification… and then revisit the PPPoE question again…
> 
> so maybe the question is actually classification v.s. not?
> 
> + IMO slow asymmetric links don't want to lose more upload bandwidth than 
> necessary.  And I'm losing a *lot* in this test.
> + As you say, having only 20ms excess would still be a big improvement.  We 
> could ignore the bait of 10ms right now.
> 
> vs
> 
> - lowest latency I've seen testing my link. almost suspicious. looks close to 
> 10ms average, when the dsl rate puts a lower bound of 7ms on the average.

        Curious: what is your link speed?

> - fq_codel honestly works miracles already. classification is the knob people 
> had to use previously, who had enough time to twiddle it.
> - on netperf-runner plots the "banding" doesn't look brilliant on slow links 
> anyway

        On slow links I always used to add “-s 0.8” with higher numbers the 
slower the link to increase the temporal averaging window, this reduces 
accuracy of the display for the downlink, but at least allows better 
understanding of the uplink. I always wanted to see whether I could treach 
netperf-wrapper to allow larger averaging windows after measurements, just for 
display purposes, but I am a total beginner with python...

> 
> 
>> Regards Sebastian
>> 
>>> 
>>> Best Regards Sebastian
>>> 
>>> P.S.: It turns out, at least on my link, that for shaping on
>>> pppoe-ge00 the kernel does not account for any header
>>> automatically, so I need to specify a per-packet-overhead (PPOH) of
>>> 40 bytes (an an ADSL2+ link with ATM linklayer); when shaping on
>>> ge00 however (with the kernel still terminating the PPPoE link to
>>> my ISP) I only need to specify an PPOH of 26 as the kernel already
>>> adds the 14 bytes for the ethernet header…

        Please disregard this part, I need to implement better tests for this 
instead on only relaying on netperf-wrapper results ;)


_______________________________________________
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel

Re: [Cerowrt-devel] SQM and PPPoE, more questions than answers...

Reply via email to