Re: Pinging a Device Every Second

2018-12-20 Thread Olav Kvittem
Hi,

The link is not the only component to fail - routers and routing protocols all 
contribute at least as much.
If your customers would have redundant connections,
you also would like to look at convergence times.
So a measurement end to end by a probe in the customers network could give
you a more true picture.
Facing that even sub second outages can annoy a video meeting,
it might be that you want to  poll more often than a second.

Realizing that your "internet service" depends on the behaviour of all all the 
other
service providers quality and if you even start monitoring that - you 
understand that
you are "in deep shit" ;-)


I did a small scale global inter domain measurement and discovered that the 
sheer number of small outages is way too high.

Many of them  might be routing changeovers in  multi-redundant networks.

cheers
Olav

On 15.12.2018 18:55, Tim Pozar wrote:
> In one of my client's company, we use LibreNMS. It is normally used > to get 
> SNMP data but we also have it configured to ping our more > "high touch" 
> cients routers. In that case we can record performance > such as latency and 
> packet loss. It will generate graphs that we can > pass on to the client. It 
> also can be set to alert us if a client's > router is not pingable. > > 
> LibreNMS can also integrate Smokeping if you want Smokeping-style > graphs 
> showing standard deviation, etc. > > Currently I am running LibreNMS on a VM 
> on a Proxmox cluser with a > couple of cores. It is probing 385 devices every 
> 5 minutes and > keeping up with that. In polling, SNMP is the real time and 
> CPU hog > where ping is pretty low impact. > > Tim > > On 12/15/18 9:37 AM, 
> Baldur Norddahl wrote: >> You could configure BFD to send out a SNMP alert 
> when three packets >> have been missed on a 50 ms cycle. Or instantly if the 
> interface >> charges state to down. This way you would know that they are 
> down >> within 150 ms. >> >> BFD is the hardware solution. A Linux box that 
> has to ping 1000 >> addresses per second will be very taxed and likely unable 
> to do >> that in a stable way. You will have seconds where it fails to do >> 
> them all followed by seconds where it attempts to do them more than >> once. 
> The result is that the statistics gathered is worthless. If >> you do 
> something like this, it is much better to have a less >> ambitious 1 minute 
> cycle. >> >> Take a look at Smokeping. If you want a graph to show the 
> quality >> of the line, Smokeping makes some very good graphs for that. >> >> 
> Regards Baldur >> >> 15. dec. 2018 16.49 skrev "Colton Conor" 
> > 
> >: >> >> How 
> much compute and network resources does it take for a NMS to: >> >> 1. ICMP 
> ping a device every second 2. Record these results. 3. >> Report an alarm 
> after so many seconds of missed pings. >> >> We are looking for a system to 
> in near real-time monitor if an end >> customers router is up or down. SNMP I 
> assume would be too >> resource intensive, so ICMP pings seem like the only 
> logical >> solution. >> >> The question is once a second pings too polling on 
> an NMS and a >> consumer grade router? Does it take much network bandwidth 
> and CPU >> resources from both the NMS and CPE side? >> >> Lets say this is 
> for a 1,000 customer ISP. >> >> >>



Re: xplornet contact or any experience with their satellite service?

2020-04-21 Thread Olav Kvittem via NANOG

On 21.04.2020 20:59, Brandon Martin wrote:

On 4/21/20 2:35 PM, Brian J. Murrell wrote:

Interesting.  So basically as Mel said, over-sold network.  :-(

This is pretty typical of consumer VSAT and such.  You can of course get better performance...if 
you're willing to pay for it.  If you find the right carrier/re-seller, you can perhaps find one 
whose "system" (to include ground station, terminals, and smarts on the bird) can respect 
DSCP and flag at least your voice traffic appropriately (probably EF) to perhaps lower the jitter 
and make conferencing more palatable.  Finding competent folks at a typical consumer provider's 
helpdesk to talk to about such things is probably the limiting factor.  The higher up the foodchain 
you go towards the folks who "own" the spectrum rather than re-sell will perhaps get you 
more luck on something like this though probably also at higher MRC.

Unfortunately, it's just what happens when you spread an already limited 
resource (transponder bandwidth) out over essentially an entire continent or at 
least substantial portions of it.  Imagine if you had a cable provider with a 
single node for an entire, say, US state.


Could be interesting to do  one-way-delay measurements in parallel to 
see how the packets are travelling


and where and how big the queues are (Bufferbloat).

Likeways if you have a unix system you could look at tcp statistics

to see if you have packet retransmits (netstat -s).

As previously noted TDMA between uploading or even ACK'ing users could 
be a bottleneck for the download speed.



Olav



Re: Bottlenecks and link upgrades

2020-08-13 Thread Olav Kvittem via NANOG

On 12.08.2020 09:31, Hank Nussbacher wrote:
>
> At what point do commercial ISPs upgrade links in their backbone as
> well as peering and transit links that are congested?  At 80%
> capacity?  90%?  95%? 
>

Hi,


Wouldn't it be better to measure the basic performance like packet drop
rates and queue sizes ?

These days live video is needed and these parameters are essential to
the quality.

Queues are building up in milliseconds and people are averaging over
minutes to estimate quality.


If you are measuring queue delay with high frequent one-way-delay
measurements

you would then be able to advice better on what the consequences of a
highly loaded link are.


We are running a research project on end-to-end quality and the enclosed
image is yesterdays report on

queuesize(h_ddelay) in ms. It shows stats on delays between some peers.

I would have looked at the trends on the involved links to see if
upgrade is necessary - 

421 ms  might be too much ig it happens often.


Best regards


  Olav Kvittem


>
> Thanks,
> Hank
>
>
> Caveat: The views expressed above are solely my own and do not express
> the views or opinions of my employer
>


pEpkey.asc
Description: application/pgp-keys


Re: Bottlenecks and link upgrades

2020-08-13 Thread Olav Kvittem via NANOG
Hi Mark,


Just comments on your points below.

On 13.08.2020 12:31, Mark Tinka wrote:
>
> On 13/Aug/20 12:23, Olav Kvittem via NANOG wrote:
>
>> Wouldn't it be better to measure the basic performance like packet
>> drop rates and queue sizes ?
>>
>> These days live video is needed and these parameters are essential to
>> the quality.
>>
>> Queues are building up in milliseconds and people are averaging over
>> minutes to estimate quality.
>>
>>
>> If you are measuring queue delay with high frequent one-way-delay
>> measurements
>>
>> you would then be able to advice better on what the consequences of a
>> highly loaded link are.
>>
>>
>> We are running a research project on end-to-end quality and the
>> enclosed image is yesterdays report on
>>
>> queuesize(h_ddelay) in ms. It shows stats on delays between some peers.
>>
>> I would have looked at the trends on the involved links to see if
>> upgrade is necessary - 
>>
>> 421 ms  might be too much ig it happens often.
>>
> I'm confident everyone (even the cheapest CFO) knows the consequences of
> congesting a link and choosing not to upgrade it.
>
> Optical issues, dirty patch cords, faulty line cards, wrong
> configurations, will almost likely lead to packet loss.  
> Link congestion
> due to insufficient bandwidth will most certainly lead to packet loss.
sure, but I guess the loss rate depends of the nature of the traffic.
>
> It's great to monitor packet loss, latency, pps, e.t.c. But packet loss
> at 10% link utilization is not a foreign occurrence. No amount of
> bandwidth upgrades will fix that.


I guess that having more reports would support the judgements better.

A basic question is : what is the effect on the perceived quality of the
customers ?

And the relation between that and /5min load is not known to me.

Actually one good indicator of the congestion loss rate are of course
the SNMP OutputDiscards.


Curves for  queueing delay, link load and discard rate are surprisingly
different.


regards

 Olav



>
> Mark.


pEpkey.asc
Description: application/pgp-keys