Hi Customers do not usually complain about 2 minutes of downtime unless it is a repeating event. We will therefore offer such customers to put their line on monitor mode, which means we will add them to smokeping. You could also start the ping once a second thing, which would be no problem if it is only a few customers on monitor mode.
However 2 minutes of downtime is a symptom of bad wifi more often than the internet connection. Regards, Baldur On Sat, Dec 15, 2018 at 7:33 PM Colton Conor <colton.co...@gmail.com> wrote: > The problem I am trying to solve is to accurately be able to tell a > customer if their home internet connection was up or down. Example, > customer calls in and says my internet was down for 2 minutes yesterday. We > need to be able to verify that their internet connection was indeed down. > Right now we have no easy way to do this. Getting metrics like packet loss > and jitter would be great too, though I realize ICMP data path does not > always equal customer experience as many network device prioritize ICMP > traffic. However ICMP pings over the internet do usually accurately tell if > a customers modem is indeed online or not. > > Most devices out in the field like ONT's and DSL modems do not support > SNMP but rather use TR-069 for management. Most of these devices only check > into the TR-069 ACS server once a day. > If the consumer device does support SNMP, they usually have weak broadcom > or qualcom SoC processors, outdated linux kernel embedded operating > systems, limited ram, and storage. Most of these can't handle SNMP walks > every minute let alone every 5. We are talking about sub $100 routers here > not Juniper, Cisco, Arista, etc. > > Most all of these consumer devices are connected to an carrier aggregation > device like a DSLAM, OLT, ethernet switch, or wireless access point. These > access devices do support SNMP, but most manufactures recommend only 5 > minute SNMP poling, so a 2 minute outage would not easily be detected. Plus > its hard to correlate that consumer X is on port Y on access switch, and > get that right for a tier 1 CSR. > > The only two ways I think I can accomplish this is: > 1. ICMP pings to a device every so many seconds. Almost every device > supports responding to WAN ICMP pings. > or > 2. IPFIX sampling at core router, and then drilling down by customer IP. I > think this will tell me if any data was flowing to this customers IP on a > second by second basis, but won't necessarily give us an up or down > indicator. Requires nothing from the consumer's router. > > > > > > On Sat, Dec 15, 2018 at 10:51 AM Stephen Satchell <l...@satchell.net> > wrote: > >> On 12/15/18 7:48 AM, Colton Conor wrote: >> > How much compute and network resources does it take for a NMS to: >> > >> > 1. ICMP ping a device every second >> > 2. Record these results. >> > 3. Report an alarm after so many seconds of missed pings. >> > >> > We are looking for a system to in near real-time monitor if an end >> > customers router is up or down. SNMP I assume would be too resource >> > intensive, so ICMP pings seem like the only logical solution. >> > >> > The question is once a second pings too polling on an NMS and a consumer >> > grade router? Does it take much network bandwidth and CPU resources from >> > both the NMS and CPE side? >> > >> > Lets say this is for a 1,000 customer ISP. >> >> What problem are you trying to solve, exactly? That more than anything >> will dictate what you do. >> >> Short answer: about 1500 bits of bandwidth, and the CPU loading on the >> remote device is almost invisible. Remember the only real difference >> between ping and SNMP monitoring (UDP) is the organization of the bits >> in the packet and the protocol number in the IP header. It's still one >> packet pair exchanged, unless you get really ambitious with your SNMP >> OID list. >> >> When I was in a medium-sized hosting company, I developed an SNMP-based >> monitoring system that would query a number of load parameters (CPU, >> disk, network, overall) on a once a minute schedule, and would keep >> history for hours on the monitoring server. The boss fretted about the >> load such monitoring would impose. He never saw any. >> >> For pure link monitoring, which is what I'm hearing you want to do, in >> my experience I found that a six-second ping cycle gives lots of early >> warning for link failures. Again, it depends on the specifications and >> detection targets. >> >> Some things to consider: >> >> 1. Router restarts take a while. Consumer-grade routers can take a >> minute or more to complete a restart to the point where it will respond >> to ping. Carrier-grade routers are more variable but in general have so >> many options built into them that it takes longer to complete a restart >> cycle. Since you are talking consumer-grade gear, you probably don't >> want to be sensitive to CP power sags. >> >> 2. Depending on the technology used on the link, you may get some >> short-term outages, on the order of seconds, so doing "rapid" pings do >> nothing for you. During my DSL time, ATM would drop out for short >> intervals -- so watch out for nuisance trips. >> >> 3. Some routers implement ping limiting, so you have to balance your >> monitoring sample rate against DoS susceptibility. Offhand, I don't know >> the granularity of consumer router ping limiting, as I've never had that >> question pop up. >> >> 4. How large a monitoring server are you willing to devote to such a >> system? My web host monitoring used a 400-MHz Pentium II box, and it >> didn't even breathe hard. (A 1U Cobalt box, repurposed with Red Had >> Linux, pulled from a junk pile.) I was monitoring about 150 web host >> servers. Extraolatuing the system load on that Cobalt box, I could have >> handled 1500 web host servers and more. >> >>