Re: [Starlink] Starlink hidden buffers

Ulrich Speidel via Starlink Sat, 13 May 2023 23:07:03 -0700


On 14/05/2023 10:57 am, David Lang wrote:

On Sat, 13 May 2023, Ulrich Speidel via Starlink wrote:
Here's a bit of a question to you all. See what you make of it.
I've been thinking a bit about the latencies we see in the Starlinknetwork. This is why this list exist (right, Dave?). So what do we know?
1) We know that RTTs can be in the 100's of ms even in what appear tobe bent-pipe scenarios where the physical one-way path should be wellunder 3000 km, with physical RTT under 20 ms.2) We know from plenty of traceroutes that these RTTs accrue in theStarlink network, not between the Starlink handover point (POP) tothe Internet.3) We know that they aren't an artifact of the Starlink WiFi router(our traceroutes were done through their Ethernet adaptor, whichbypasses the router), so they must be delays on the satellites or theteleports.
the ethernet adapter bypasses the wifi, but not the router, you haveto cut the cable and replace the plug to bypass the router

Good point - but you still don't get the WiFi buffering here. Or atleast we don't seem to, looking at the difference between running withand without the adapter.

4) We know that processing delay isn't a huge factor because we alsosee RTTs well under 30 ms.
5) That leaves queuing delays.
This issue has been known for a while now. Starlink have beeninnovating their heart out around pretty much everything here - andyet, this bufferbloat issue hasn't changed, despite Dave proposingwhat appears to be an easy fix compared to a lot of other things theyhave done. So what are we possibly missing here?
Going back to first principles: The purpose of a buffer on a networkdevice is to act as a shock absorber against sudden traffic bursts.If I want to size that buffer correctly, I need to know at the veryleast (paraphrasing queueing theory here) something about my packetarrival process.
The question is over what timeframe. If you have a huge buffer, youcan buffer 10s of seconds of traffic and eventually send it. That willmake benchmarks look good, but not the user experience. The rapid dropin RAM prices (beyond merely a free fall) and the benchmark scoresthat heavily penalized any dropped packets encouraged buffers to getlarger than is sane.
it's still a good question to define what is sane, the longer thebuffer, the mor of a chance of finding time to catch up, but havingpackets in the buffer that have timed out (i.e. DNS queries tend totime out after 3 seconds, TCP will give up and send replacementpackets, making the initial packets meaningless) is counterproductive.What is the acceptable delay to your users?
Here at the bufferbloat project, we tend to say that buffers past afew 10s of ms worth of traffic are probably bad and are aiming tosingle-digit ms in many cases.

Taken as read.

If I look at conventional routers, then that arrival process involvestraffic generated by a user population that changes relativelyslowly: WiFi users come and go. One at a time. Computers in a companyget turned on and off and rebooted, but there are no instantaneousjumps in load - you don't suddenly have a hundred users in the middleof watching Netflix turning up that weren't there a second ago. Mostof what we know about Internet traffic behaviour is based on thissort of network, and this is what we've designed our queuing systemsaround, right?
not true, for businesses, every hour as meetings start and let out,and as people arrive in the morning, arrive back from lunch, you havevery sharp changes in the traffic.

And herein lies the crunch: All of these things that you list happenover much longer timeframes than a switch to a different satellite.Also, folk coming back from lunch would start with something likecwnd=10. Users whose TCP connections get switched over to a differentsatellite by some underlying tunneling protocol could have much largercwnd.

at home you have less changes in users, but you also may have lessbandwidth (although many tech enthusiasts have more bandwidth thanmany companies, two of my last 3 jobs have had <400Mb at their mainoffice with hundreds of employees while many people would considerthat 'slow' for home use). As such a parent arriving home with acouple of kids will make a drastic change to the network usage in avery short time.

I think you've missed my point - I'm talking about changes in networkmid-flight, not people coming home and getting started over a period ofa few minutes. The change you see in a handover is sudden and probablywidth sub-second ramp-up. And it's something that doesn't just happenwhen people come home or return from lunch - it happens every few minutes.

but the active quueing systems that we are designing (cake, fq_codel)handle these conditions very well because they don't try to guess whatthe usage is going to be, they just look at the packets that they haveto process and figure out how to dispatch them out in the best way.

Understood - I've followed your work.

because we have observed that latency tends to be more noticable forshort connections (DNS, checking if cached web pages are up to date,etc), our algorithms give a slight priority to new-low-trafficconnections over long-running-high-traffic connections rather thanjust splitting the bandwidth evenly across all connections, and caneven go further to split bandwith between endpoints, not justconnections (with endpoints being a configurable definition)
without active queue management, the default is FIFO, which allows thehigh-user-impact, short connection packets to sit in a queue behindthe low-user-impace, bulk data transfers. For benchmarks,a-packet-is-a-packet and they all count, so until you have enoughbuffering that you start having expired packets in flight, it doesn'tmatter, but for the user experience, there can be a huge difference.

All understood - you're preaching to the converted. It's just that Ithink Starlink may be a different ballpark.

Put another way: If you have a protocol (TCP) that is designed toreasonably expect that its current cwnd is OK to use for now is put intoa situation where there are relatively frequent, huge and lasting stepchanges in available BDP within subsecond periods, are your underlyingassumptions still valid?

I suspect they're handing over whole cells, not individual users, at atime.


David Lang

--
****************************************************************
Dr. Ulrich Speidel

School of Computer Science

Room 303S.594 (City Campus)

The University of Auckland
u.spei...@auckland.ac.nz
http://www.cs.auckland.ac.nz/~ulrich/
****************************************************************



_______________________________________________
Starlink mailing list
Starlink@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/starlink

Re: [Starlink] Starlink hidden buffers

Reply via email to