Large RTT or Why doesn't my ping traffic get discarded?

2022-12-21 Thread Jason Iannone
Here's a question I haven't bothered to ask until now. Can someone please
help me understand why I receive a ping reply after almost 5 seconds? As I
understand it, buffers in SP gear are generally 100ms. According to my math
this round trip should have been discarded around the 1 second mark, even
in a long path. Maybe I should buy a lottery ticket. I don't get it. What
is happening here?

Jason

64 bytes from 4.2.2.2: icmp_seq=392 ttl=54 time=4834.737 ms
64 bytes from 4.2.2.2: icmp_seq=393 ttl=54 time=4301.243 ms
64 bytes from 4.2.2.2: icmp_seq=394 ttl=54 time=3300.328 ms
64 bytes from 4.2.2.2: icmp_seq=396 ttl=54 time=1289.723 ms
Request timeout for icmp_seq 400
Request timeout for icmp_seq 401
64 bytes from 4.2.2.2: icmp_seq=398 ttl=54 time=4915.096 ms
64 bytes from 4.2.2.2: icmp_seq=399 ttl=54 time=4310.575 ms
64 bytes from 4.2.2.2: icmp_seq=400 ttl=54 time=4196.075 ms
64 bytes from 4.2.2.2: icmp_seq=401 ttl=54 time=4287.048 ms
64 bytes from 4.2.2.2: icmp_seq=403 ttl=54 time=2280.466 ms
64 bytes from 4.2.2.2: icmp_seq=404 ttl=54 time=1279.348 ms
64 bytes from 4.2.2.2: icmp_seq=405 ttl=54 time=276.669 ms


Re: Large RTT or Why doesn't my ping traffic get discarded?

2022-12-21 Thread Mel Beckman
Sometimes this is usually due to high CPU time on the target device. If the 
device is under heavy load, the ICMP Echo process gets lowest priority. With a 
well-known name server like 4.2.2.2, this seems unlikely. It could be an 
intermediate hop or a routing loop, Do a traceroute to get more detailed 
per-hop statistics.

 -mel

From: NANOG  on behalf of Jason 
Iannone 
Sent: Wednesday, December 21, 2022 9:10 AM
To: North American Network Operators' Group 
Subject: Large RTT or Why doesn't my ping traffic get discarded?

Here's a question I haven't bothered to ask until now. Can someone please help 
me understand why I receive a ping reply after almost 5 seconds? As I 
understand it, buffers in SP gear are generally 100ms. According to my math 
this round trip should have been discarded around the 1 second mark, even in a 
long path. Maybe I should buy a lottery ticket. I don't get it. What is 
happening here?

Jason

64 bytes from 4.2.2.2: icmp_seq=392 ttl=54 time=4834.737 ms
64 bytes from 4.2.2.2: icmp_seq=393 ttl=54 time=4301.243 ms
64 bytes from 4.2.2.2: icmp_seq=394 ttl=54 time=3300.328 ms
64 bytes from 4.2.2.2: icmp_seq=396 ttl=54 time=1289.723 ms
Request timeout for icmp_seq 400
Request timeout for icmp_seq 401
64 bytes from 4.2.2.2: icmp_seq=398 ttl=54 time=4915.096 ms
64 bytes from 4.2.2.2: icmp_seq=399 ttl=54 time=4310.575 ms
64 bytes from 4.2.2.2: icmp_seq=400 ttl=54 time=4196.075 ms
64 bytes from 4.2.2.2: icmp_seq=401 ttl=54 time=4287.048 ms
64 bytes from 4.2.2.2: icmp_seq=403 ttl=54 time=2280.466 ms
64 bytes from 4.2.2.2: icmp_seq=404 ttl=54 time=1279.348 ms
64 bytes from 4.2.2.2: icmp_seq=405 ttl=54 time=276.669 ms


Re: Large RTT or Why doesn't my ping traffic get discarded?

2022-12-21 Thread Mel Beckman
Keep in mind that ping reports round trip time, so there could be a device 
delaying the ping reply on the return trip. In these cases, it helps to have a 
traceroute from both ends, to detect asymmetrical routing and possibly return 
path congestion invisible in a traceroute from you end.

From: NANOG  on behalf of Mel Beckman 

Sent: Wednesday, December 21, 2022 9:22 AM
To: Jason Iannone ; North American Network Operators' 
Group 
Subject: Re: Large RTT or Why doesn't my ping traffic get discarded?

Sometimes this is usually due to high CPU time on the target device. If the 
device is under heavy load, the ICMP Echo process gets lowest priority. With a 
well-known name server like 4.2.2.2, this seems unlikely. It could be an 
intermediate hop or a routing loop, Do a traceroute to get more detailed 
per-hop statistics.

 -mel

From: NANOG  on behalf of Jason 
Iannone 
Sent: Wednesday, December 21, 2022 9:10 AM
To: North American Network Operators' Group 
Subject: Large RTT or Why doesn't my ping traffic get discarded?

Here's a question I haven't bothered to ask until now. Can someone please help 
me understand why I receive a ping reply after almost 5 seconds? As I 
understand it, buffers in SP gear are generally 100ms. According to my math 
this round trip should have been discarded around the 1 second mark, even in a 
long path. Maybe I should buy a lottery ticket. I don't get it. What is 
happening here?

Jason

64 bytes from 4.2.2.2: icmp_seq=392 ttl=54 time=4834.737 ms
64 bytes from 4.2.2.2: icmp_seq=393 ttl=54 time=4301.243 ms
64 bytes from 4.2.2.2: icmp_seq=394 ttl=54 time=3300.328 ms
64 bytes from 4.2.2.2: icmp_seq=396 ttl=54 time=1289.723 ms
Request timeout for icmp_seq 400
Request timeout for icmp_seq 401
64 bytes from 4.2.2.2: icmp_seq=398 ttl=54 time=4915.096 ms
64 bytes from 4.2.2.2: icmp_seq=399 ttl=54 time=4310.575 ms
64 bytes from 4.2.2.2: icmp_seq=400 ttl=54 time=4196.075 ms
64 bytes from 4.2.2.2: icmp_seq=401 ttl=54 time=4287.048 ms
64 bytes from 4.2.2.2: icmp_seq=403 ttl=54 time=2280.466 ms
64 bytes from 4.2.2.2: icmp_seq=404 ttl=54 time=1279.348 ms
64 bytes from 4.2.2.2: icmp_seq=405 ttl=54 time=276.669 ms


Re: Large RTT or Why doesn't my ping traffic get discarded?

2022-12-21 Thread William Herrin
On Wed, Dec 21, 2022 at 9:10 AM Jason Iannone  wrote:
> Here's a question I haven't bothered to ask until now. Can someone please 
> help me understand why I receive a ping reply after almost 5 seconds?
>
> 64 bytes from 4.2.2.2: icmp_seq=398 ttl=54 time=4915.096 ms
> 64 bytes from 4.2.2.2: icmp_seq=399 ttl=54 time=4310.575 ms
> 64 bytes from 4.2.2.2: icmp_seq=400 ttl=54 time=4196.075 ms
> 64 bytes from 4.2.2.2: icmp_seq=401 ttl=54 time=4287.048 ms
> 64 bytes from 4.2.2.2: icmp_seq=403 ttl=54 time=2280.466 ms
> 64 bytes from 4.2.2.2: icmp_seq=404 ttl=54 time=1279.348 ms
> 64 bytes from 4.2.2.2: icmp_seq=405 ttl=54 time=276.669 ms

Hi Jason,

This usually means a problem on the Linux machine originating the
packet. It has lost the ARP for the next hop or something similar so
the outbound ICMP packet is queued. The glitch repairs itself,
briefly, releasing the queued packets. Then it comes right back.

Regards,
Bill Herrin


-- 
For hire. https://bill.herrin.us/resume/


Re: Large RTT or Why doesn't my ping traffic get discarded?

2022-12-21 Thread Dave Taht
There's this thing called bufferbloat...

On Wed, Dec 21, 2022 at 11:58 AM William Herrin  wrote:
>
> On Wed, Dec 21, 2022 at 9:10 AM Jason Iannone  wrote:
> > Here's a question I haven't bothered to ask until now. Can someone please 
> > help me understand why I receive a ping reply after almost 5 seconds?
> >
> > 64 bytes from 4.2.2.2: icmp_seq=398 ttl=54 time=4915.096 ms
> > 64 bytes from 4.2.2.2: icmp_seq=399 ttl=54 time=4310.575 ms
> > 64 bytes from 4.2.2.2: icmp_seq=400 ttl=54 time=4196.075 ms
> > 64 bytes from 4.2.2.2: icmp_seq=401 ttl=54 time=4287.048 ms
> > 64 bytes from 4.2.2.2: icmp_seq=403 ttl=54 time=2280.466 ms
> > 64 bytes from 4.2.2.2: icmp_seq=404 ttl=54 time=1279.348 ms
> > 64 bytes from 4.2.2.2: icmp_seq=405 ttl=54 time=276.669 ms
>
> Hi Jason,
>
> This usually means a problem on the Linux machine originating the
> packet. It has lost the ARP for the next hop or something similar so
> the outbound ICMP packet is queued. The glitch repairs itself,
> briefly, releasing the queued packets. Then it comes right back.
>
> Regards,
> Bill Herrin
>
>
> --
> For hire. https://bill.herrin.us/resume/



-- 
This song goes out to all the folk that thought Stadia would work:
https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-698135607352320-FXtz
Dave Täht CEO, TekLibre, LLC


Re: Large RTT or Why doesn't my ping traffic get discarded?

2022-12-21 Thread J. Hellenthal via NANOG
As well if this persists you may consider disabling hardware rx/tx checksumming 
to see if it clears up your results. Some net cards can get glitchy causing 
this exact behavior.

GL

-- 
 J. Hellenthal

The fact that there's a highway to Hell but only a stairway to Heaven says a 
lot about anticipated traffic volume.

> On Dec 21, 2022, at 13:58, William Herrin  wrote:
> 
> On Wed, Dec 21, 2022 at 9:10 AM Jason Iannone  
> wrote:
>> Here's a question I haven't bothered to ask until now. Can someone please 
>> help me understand why I receive a ping reply after almost 5 seconds?
>> 
>> 64 bytes from 4.2.2.2: icmp_seq=398 ttl=54 time=4915.096 ms
>> 64 bytes from 4.2.2.2: icmp_seq=399 ttl=54 time=4310.575 ms
>> 64 bytes from 4.2.2.2: icmp_seq=400 ttl=54 time=4196.075 ms
>> 64 bytes from 4.2.2.2: icmp_seq=401 ttl=54 time=4287.048 ms
>> 64 bytes from 4.2.2.2: icmp_seq=403 ttl=54 time=2280.466 ms
>> 64 bytes from 4.2.2.2: icmp_seq=404 ttl=54 time=1279.348 ms
>> 64 bytes from 4.2.2.2: icmp_seq=405 ttl=54 time=276.669 ms
> 
> Hi Jason,
> 
> This usually means a problem on the Linux machine originating the
> packet. It has lost the ARP for the next hop or something similar so
> the outbound ICMP packet is queued. The glitch repairs itself,
> briefly, releasing the queued packets. Then it comes right back.
> 
> Regards,
> Bill Herrin
> 
> 
> -- 
> For hire. https://bill.herrin.us/resume/


Re: Large RTT or Why doesn't my ping traffic get discarded?

2022-12-21 Thread William Herrin
On Wed, Dec 21, 2022 at 1:20 PM Dave Taht  wrote:
> On Wed, Dec 21, 2022 at 11:58 AM William Herrin  wrote:
> > On Wed, Dec 21, 2022 at 9:10 AM Jason Iannone  
> > wrote:
> > > Here's a question I haven't bothered to ask until now. Can someone please 
> > > help me understand why I receive a ping reply after almost 5 seconds?
> > >
> > > 64 bytes from 4.2.2.2: icmp_seq=398 ttl=54 time=4915.096 ms
> > > 64 bytes from 4.2.2.2: icmp_seq=399 ttl=54 time=4310.575 ms
> > > 64 bytes from 4.2.2.2: icmp_seq=400 ttl=54 time=4196.075 ms
> > > 64 bytes from 4.2.2.2: icmp_seq=401 ttl=54 time=4287.048 ms
> > > 64 bytes from 4.2.2.2: icmp_seq=403 ttl=54 time=2280.466 ms
> > > 64 bytes from 4.2.2.2: icmp_seq=404 ttl=54 time=1279.348 ms
> > > 64 bytes from 4.2.2.2: icmp_seq=405 ttl=54 time=276.669 ms
> >
> > Hi Jason,
> >
> > This usually means a problem on the Linux machine originating the
> > packet. It has lost the ARP for the next hop or something similar so
> > the outbound ICMP packet is queued. The glitch repairs itself,
> > briefly, releasing the queued packets. Then it comes right back.

> There's this thing called bufferbloat...

Hi Dave,

Yes, but I've seen this particular pattern before and it's generally
not bufferbloat. With bufferbloat you usually see consistent long ping
times: this ping is 3 seconds, the next ping is 2.9, the next is 3.2.
This example had a descending pattern spread exactly the number of
seconds apart that the ICMP message was sent. The descending pattern
indicates something went wrong with arp, or a virtual machine was
starved for CPU time and didn't run for a couple seconds, or something
like that.

Regards,
Bill Herrin

-- 
For hire. https://bill.herrin.us/resume/


Re: Large RTT or Why doesn't my ping traffic get discarded?

2022-12-21 Thread Joelle Maslak
You didn't tell us anything about your path or your endpoint, or if you see
this just with Lumen's DNS servers or with other devices.  So it is hard to
guess what is going on here.

That said, I know I've seen this kind of behavior both with buffer bloat on
consumer devices (particularly the uplink direction) and wifi networks
(which can have surprisingly deep buffers, with retransmissions occurring
at layer 1.5/2).  My guess is that there is a software routing/switching
device somewhere in the path (wifi AP, home router, Linux or BSD router,
etc).

On Wed, Dec 21, 2022 at 10:10 AM Jason Iannone 
wrote:

> Here's a question I haven't bothered to ask until now. Can someone please
> help me understand why I receive a ping reply after almost 5 seconds? As I
> understand it, buffers in SP gear are generally 100ms. According to my math
> this round trip should have been discarded around the 1 second mark, even
> in a long path. Maybe I should buy a lottery ticket. I don't get it. What
> is happening here?
>
> Jason
>
> 64 bytes from 4.2.2.2: icmp_seq=392 ttl=54 time=4834.737 ms
> 64 bytes from 4.2.2.2: icmp_seq=393 ttl=54 time=4301.243 ms
> 64 bytes from 4.2.2.2: icmp_seq=394 ttl=54 time=3300.328 ms
> 64 bytes from 4.2.2.2: icmp_seq=396 ttl=54 time=1289.723 ms
> Request timeout for icmp_seq 400
> Request timeout for icmp_seq 401
> 64 bytes from 4.2.2.2: icmp_seq=398 ttl=54 time=4915.096 ms
> 64 bytes from 4.2.2.2: icmp_seq=399 ttl=54 time=4310.575 ms
> 64 bytes from 4.2.2.2: icmp_seq=400 ttl=54 time=4196.075 ms
> 64 bytes from 4.2.2.2: icmp_seq=401 ttl=54 time=4287.048 ms
> 64 bytes from 4.2.2.2: icmp_seq=403 ttl=54 time=2280.466 ms
> 64 bytes from 4.2.2.2: icmp_seq=404 ttl=54 time=1279.348 ms
> 64 bytes from 4.2.2.2: icmp_seq=405 ttl=54 time=276.669 ms
>


-- 
Sincerely,
Ms. Joelle Maslak


RE: Large RTT or Why doesn't my ping traffic get discarded?

2022-12-21 Thread Jerry Cloe
Because there is no standard for discarding "old" traffic, only discard is for 
packets that hop too many times. There is, however, a standard for decrementing 
TTL by 1 if a packet sits on a device for more than 1000ms, and of course we 
all know what happens when TTL hits zero. Based on that, your packet could have 
floated around for another 53 seconds. Having said that, I'm not sure many 
devices actually do this (but its not likely it would have had a significant 
impact on this traffic anyway).

 
-Original message-
From:Jason Iannone 

Sent:Wed 12-21-2022 11:11 am
Subject:Large RTT or Why doesn‘t my ping traffic get discarded?
To:North American Network Operators‘ Group ; 
 
Here's a question I haven't bothered to ask until now. Can someone please help 
me understand why I receive a ping reply after almost 5 seconds? As I 
understand it, buffers in SP gear are generally 100ms. According to my math 
this round trip should have been discarded around the 1 second mark, even in a 
long path. Maybe I should buy a lottery ticket. I don't get it. What is 
happening here?
 Jason
 64 bytes from 4.2.2.2  : icmp_seq=392 ttl=54 time=4834.737 ms
64 bytes from 4.2.2.2  : icmp_seq=393 ttl=54 time=4301.243 ms
64 bytes from 4.2.2.2  : icmp_seq=394 ttl=54 time=3300.328 ms
64 bytes from 4.2.2.2  : icmp_seq=396 ttl=54 time=1289.723 ms
Request timeout for icmp_seq 400
Request timeout for icmp_seq 401
64 bytes from 4.2.2.2  : icmp_seq=398 ttl=54 time=4915.096 ms
64 bytes from 4.2.2.2  : icmp_seq=399 ttl=54 time=4310.575 ms
64 bytes from 4.2.2.2  : icmp_seq=400 ttl=54 time=4196.075 ms
64 bytes from 4.2.2.2  : icmp_seq=401 ttl=54 time=4287.048 ms
64 bytes from 4.2.2.2  : icmp_seq=403 ttl=54 time=2280.466 ms
64 bytes from 4.2.2.2  : icmp_seq=404 ttl=54 time=1279.348 ms
64 bytes from 4.2.2.2  : icmp_seq=405 ttl=54 time=276.669 ms
 

Re: Large RTT or Why doesn't my ping traffic get discarded?

2022-12-21 Thread Saku Ytti
There certainly aren't any temporal buffers in SP gear limiting the
buffer to 100ms, nor are there any mechanisms to temporally decrease
TTL or hop-limit. Some devices may expose temporal configuration to
UX, but that is just a multiplier for max_buffer_bytes, and what is
programmed is a fixed amount of bytes instead of temporal limit as
function of observed traffic rate.
This is important, because HW may support tens or even hundreds of
thousands of queues, because HW may support large amount of logical
interfaces with HQoS and multiple queues each, then if such device is
ran with single logical interface, which is low speed either
physically or shaped, you may end up having very very long temporal
queues, not because people intend to queue long, but because
understanding all of this requires lot of context and information
about platform which isn't readily available nor is solved by 'just
remove those buffers from devices physically, it's bufferbloat'.

Like others have pointed out, there is not much information to go with
and this could be many things, one of those could be 'buffer bloat'
like Taht pointed out, this might be true because cyclical nature of
the ping, buffer getting filled and drained. I don't really think
ARP/ND is good candidate like Herring suggested, because it's
cyclical, instead of exactly single event, but not impossible.

We'd really need to see full mtr output, and if or not this affects
other destinations, if it just affects icmp or also dns, ideally
reverse traceroute as well. I can tell that I'm not observing the
issue, nor did I expect to observe it, as I expect problem to close to
your network, and therefore affecting a lot of destinations.


On Thu, 22 Dec 2022 at 07:35, Jerry Cloe  wrote:
>
>
> Because there is no standard for discarding "old" traffic, only discard is 
> for packets that hop too many times. There is, however, a standard for 
> decrementing TTL by 1 if a packet sits on a device for more than 1000ms, and 
> of course we all know what happens when TTL hits zero. Based on that, your 
> packet could have floated around for another 53 seconds. Having said that, 
> I'm not sure many devices actually do this (but its not likely it would have 
> had a significant impact on this traffic anyway).
>
>
>
> -Original message-
> From: Jason Iannone 
> Sent: Wed 12-21-2022 11:11 am
> Subject: Large RTT or Why doesn‘t my ping traffic get discarded?
> To: North American Network Operators‘ Group ;
> Here's a question I haven't bothered to ask until now. Can someone please 
> help me understand why I receive a ping reply after almost 5 seconds? As I 
> understand it, buffers in SP gear are generally 100ms. According to my math 
> this round trip should have been discarded around the 1 second mark, even in 
> a long path. Maybe I should buy a lottery ticket. I don't get it. What is 
> happening here?
>
> Jason
>
> 64 bytes from 4.2.2.2: icmp_seq=392 ttl=54 time=4834.737 ms
> 64 bytes from 4.2.2.2: icmp_seq=393 ttl=54 time=4301.243 ms
> 64 bytes from 4.2.2.2: icmp_seq=394 ttl=54 time=3300.328 ms
> 64 bytes from 4.2.2.2: icmp_seq=396 ttl=54 time=1289.723 ms
> Request timeout for icmp_seq 400
> Request timeout for icmp_seq 401
> 64 bytes from 4.2.2.2: icmp_seq=398 ttl=54 time=4915.096 ms
> 64 bytes from 4.2.2.2: icmp_seq=399 ttl=54 time=4310.575 ms
> 64 bytes from 4.2.2.2: icmp_seq=400 ttl=54 time=4196.075 ms
> 64 bytes from 4.2.2.2: icmp_seq=401 ttl=54 time=4287.048 ms
> 64 bytes from 4.2.2.2: icmp_seq=403 ttl=54 time=2280.466 ms
> 64 bytes from 4.2.2.2: icmp_seq=404 ttl=54 time=1279.348 ms
> 64 bytes from 4.2.2.2: icmp_seq=405 ttl=54 time=276.669 ms



-- 
  ++ytti


Re: Large RTT or Why doesn't my ping traffic get discarded?

2022-12-21 Thread William Herrin
On Wed, Dec 21, 2022 at 10:07 PM Saku Ytti  wrote:
> I don't really think
> ARP/ND is good candidate like Herring suggested, because it's
> cyclical, instead of exactly single event, but not impossible.

Suppose you have a loose network cable between your Linux server and a
switch. Layer 1. That RJ45 just isn't quite solid. It's mostly working
but not quite right. What does it look like at layer 2? One thing it
can look like is a periodic carrier flash where the NIC thinks it has
no carrier, then immediately thinks it has enough of a carrier to
negotiate speed and duplex. How does layer 3 respond to that?

1s: send ping toward default router
1.1s: ping response from remote server
2s: send ping toward default router
2.1s: ping response from remote server
2.5s: carrier down
2.501s: carrier up
3s: queue ping, arp for default router, no response
4s: queue ping, arp for default router, no response
5s: queue ping, arp for default router, no response
6s: queue ping, arp for default router, no response
7s: queue ping, arp for default router
7.01s: arp response, send all 5 queued pings but note that the
earliest is more than 4 seconds old.
7.1s: response from all 5 queued pings.

Cable still isn't right though, so in a few seconds or a few minutes
you're going to get another carrier flash and the pattern will repeat.

I've also seen some cheap switches get stuck doing this even after the
faulty cable connection is repaired, not clearing until a reboot.

Regards,
Bill Herrin


-- 
For hire. https://bill.herrin.us/resume/


Re: Large RTT or Why doesn't my ping traffic get discarded?

2022-12-21 Thread Saku Ytti
On Thu, 22 Dec 2022 at 08:41, William Herrin  wrote:

> Suppose you have a loose network cable between your Linux server and a
> switch. Layer 1. That RJ45 just isn't quite solid. It's mostly working
> but not quite right. What does it look like at layer 2? One thing it
> can look like is a periodic carrier flash where the NIC thinks it has
> no carrier, then immediately thinks it has enough of a carrier to
> negotiate speed and duplex. How does layer 3 respond to that?

Agreed. But then once the resolve happens, and linux floods the queued
pings out, the responses would come ~immediately. So the delta between
the RTT would remain at the send interval, in this case 1s. In this
case, we see the RTT decreasing as if the buffer is being purged,
until it seems to be filled again, up-until 5s or so.

I don't exclude the rationale, I just think it's not likely based on
the latencies observed. But at any rate with so little data, my
confidence to include or exclude any specific explanation is low.

>
> 1s: send ping toward default router
> 1.1s: ping response from remote server
> 2s: send ping toward default router
> 2.1s: ping response from remote server
> 2.5s: carrier down
> 2.501s: carrier up
> 3s: queue ping, arp for default router, no response
> 4s: queue ping, arp for default router, no response
> 5s: queue ping, arp for default router, no response
> 6s: queue ping, arp for default router, no response
> 7s: queue ping, arp for default router
> 7.01s: arp response, send all 5 queued pings but note that the
> earliest is more than 4 seconds old.
> 7.1s: response from all 5 queued pings.
>
> Cable still isn't right though, so in a few seconds or a few minutes
> you're going to get another carrier flash and the pattern will repeat.
>
> I've also seen some cheap switches get stuck doing this even after the
> faulty cable connection is repaired, not clearing until a reboot.
>
> Regards,
> Bill Herrin
>
>
> --
> For hire. https://bill.herrin.us/resume/



-- 
  ++ytti


Re: Large RTT or Why doesn't my ping traffic get discarded?

2022-12-21 Thread William Herrin
On Wed, Dec 21, 2022 at 11:03 PM Saku Ytti  wrote:
> On Thu, 22 Dec 2022 at 08:41, William Herrin  wrote:
> > Suppose you have a loose network cable between your Linux server and a
> > switch. Layer 1. That RJ45 just isn't quite solid. It's mostly working
> > but not quite right. What does it look like at layer 2? One thing it
> > can look like is a periodic carrier flash where the NIC thinks it has
> > no carrier, then immediately thinks it has enough of a carrier to
> > negotiate speed and duplex. How does layer 3 respond to that?
>
> Agreed. But then once the resolve happens, and linux floods the queued
> pings out, the responses would come ~immediately. So the delta between
> the RTT would remain at the send interval, in this case 1s. In this
> case, we see the RTT decreasing as if the buffer is being purged,
> until it seems to be filled again, up-until 5s or so.

Howdy,

Not quite. The ping origination time isn't set when layer 3 decides
the packet can be delivered to layer 2, it's set when layer 7 drops
the packet on the stack. In other words: when the ping app "sends" the
packet, not when the NIC actually puts the packet on the wire or even
when the OS sends the packet over to the NIC. The time the packet
spends queued waiting for ARP to supply a next-hop MAC address counts
against the round trip time.

When you see this pattern of descending ping times exactly one second
apart where the responses all arrived at once, it's usually because
something in the path didn't have the next-hop MAC address for a
while, and then it did. And it's usually not something deep in the
network because something deep would exhaust it's transmission queue
long before it could queue several seconds worth of pings.

If you want to prove this to yourself, set up a Linux box, install a
filter to drop arp replies (arptables or nftables), delete the arp
entry for your default router (arp -d) and then start pinging
something. When you -remove- the arp filter, you'll see the pattern in
the ping responses that Jason posted.

You may get different results in other OSes. For example, Windows will
lose its DHCP address with the carrier flash, so when ping tries to
send the packet the network is unreachable. Because the stack
considers the network unreachable, the ping packet isn't queued and
the error is reported immediately to the application.

Regards,
Bill Herrin


-- 
For hire. https://bill.herrin.us/resume/