Hi Emanuele, Thank you again for the detailed responses.
>From the interfaces page, I see these stats: Total Traffic 91.6 GB [103,062,265 Pkts] Dropped Packets 0 Pkts I don't see any dropped packets on the NIC either: ethtool -S enp2s0 NIC statistics: tx_packets: 0 rx_packets: 106581943 tx_errors: 0 rx_errors: 0 rx_missed: 0 align_errors: 0 tx_single_collisions: 0 tx_multi_collisions: 0 unicast: 105432876 broadcast: 350738 multicast: 1149060 tx_aborted: 0 tx_underrun: 0 As of right now, 2 of the hosts we are discussing are still in alert, at the original Date/Time of 07:25:01, and Duration is now "3 Days, 08:06:59". Given that my replies vs requests ratio is still configured at 50%, this means that, at every 5 minute interval for the last 3 Days, 8 hours, said host is receiving < 50% DNS replies, correct? I find this difficult to believe, and cannot find ANY missing packets in my pcap file. I have captured a 30 minute pcap file captured with this command: tcpdump -i enp2s0 -G 1800 -w /tmp/enp2s0.%FT%T.pcap host edgemax and port 53 This file contains DNS traffic to/from edgemax only. I can count responses like this: tshark -t a -r enp2s0.2020-05-11T13:00:02.pcap | grep -c "Standard query response" 349 And queries like this: tshark -t a -r enp2s0.2020-05-11T13:00:02.pcap | grep -c "Standard query 0x" 349 In other words, no missing DNS responses in the 30 minutes spanning 13:00:02 to 13:29:51. I would think that the alert should "clear" because the threshold is not exceeded within that 30 minute pcap file. In any case, at 13:23, I manually click on the "Release" button for that alert. 2 minutes later, at 13:25:00, I receive this alert: Host edgemax has received 62 DNS requests but sent 0 DNS replies [5 Minutes ratio: 0%] As stated previously, no missing DNS responses in the 30 minutes spanning 13:00:02 to 13:29:51. Why does ntopng think 62 replies are missing? I exported 10 minutes of PCAP from if_stats.lua. Using the filter "(ip.dst_host == "10.12.17.1" or ip.src_host == "10.12.17.1") and dns" I am not able to find any missing DNS responses in wireshark. Interestingly, If I specify a BPF Filter ("port 53"), the downloaded PCAP file seems to only have 1 side (ie. edgemax is only a source, never a dest. Without a BPF Filter, the download is fine. On Mon, May 11, 2020 at 8:59 AM Emanuele Faranda <fara...@ntop.org> wrote: > Hi Aaron, > > Please see below: > On 5/8/20 10:27 PM, Aaron Scamehorn wrote: > > Thank you for your response. In the screenshot below, can you please > explain the significance of the "Date/Time" and the "Duration" columns? > What do they mean in this context? > > Date/Time: the time when the alert was triggered. Ntopng performs periodic > checks in order to trigger alerts. In this particular case, the check on > the requests/reply ratio is performed every 5 minutes. So this means that > problem started between 07:20 and 07:25 . > > Duration: the total time in which the problem was active. Again, the check > is performed every 5 minutes for this alert so 5 minutes is the granularity. > > > Do I understand correctly that all 3 hosts triggered the alert at 07:25:01 > (OR 07:30:01) this morning? And that all three alerts are active for the > past 07:28:53 hours? Does this mean that there have been no new > additional DNS Reply/Request issues have been detected? > > As explained above, the problem started between 07:20 and 07:25 . For > 07:28:53 hours the problem was active on all the three hosts (the > requests/reply ratio threshold was exceeded for 07:28:53 hours). > > > I notice in "Past Alerts" tab, that there are many Reply/Request Alerts > for the same host with very short durations (screen shot #2). When/how > does an alert move from the "Engaged" to "Past" tab? > > In this case, the engaged alert becomes "past" alert when, after the check > performed every 5 minutes, the requests/reply ratio threshold is not exceed > anymore. This can happen as soon as the next check is performed (5 minutes). > > > So in the 2nd screenshot, fire-TV had an alert at 06:20:00 for 05:00 > minutes where 18 requests received 0 replies. Then another alert at > 06:50:00 for 05:00 minutes. Were the 18 replies from the first alert > ultimately received? And they were received 5 minutes the alert occurred? > > The check is performed on the DNS packet counters. A DNS request cannot > take 5 minutes to be replied. The fact that the alert was closed after 5/10 > minutes could be related to one of these events: > > - The host went idle > > - The host did not send enough DNS requests > > - The new DNS requests made by the host were successfully replied. > > > Context here is that 99% of the traffic is Internet traffic. Almost all > of the pihole traffic is to forwarders. BTW, the way pihole works (by > default) is it replies 0.0.0.0 for blocked hosts. It should respond to > every query. > > I tried the live_pcap_download.html > <https://www.ntop.org/guides/ntopng/advanced_features/live_pcap_download.html> > lua, but couldn't figure out the bpf_filter: > curl --cookie "user=admin; password=xxxxx" " > http://10.12.17.25:3000/lua/live_traffic.lua?ifid=0&duration=600&bpf_filter=\"port > 53\"" > > I also tried the download pcap on the if_stats.lua page. The downloaded > pcap file seems to only contain incoming data (see wireshark)? > > This is consistent with the above alerts, please ensure that ntopng is not > dropping packets as this would explain this behavior. > > > If I just do a tshark on the same interface that ntopng is listening on, I > see all of the expected DNS query & replies. I am not able to correlate > the alerts to any missing packets. > > See response above. > > Regards, > > Emanuele > > > > > On Fri, May 8, 2020 at 2:53 AM Emanuele Faranda <fara...@ntop.org> wrote: > >> Hi Aaron, >> >> The alerts that you are reporting basically tell you that such hosts >> receive DNS requests but do not send a reply. In order to troubleshoot >> possible problems you should augment such information with the knowledge of >> your network. >> >> The first question to answer is, are that hosts expected to accept DNS >> requests? If not, are the requests generated from the internet or from the >> LAN? In the first case a firewall to block such DNS requests may be a good >> idea . In the latter case some hosts in the LAN may be misconfigured. In >> case of the pihole hosts, I expect pihole to block some DNS requests for >> advertisement sites so this could be a normal behaviour. The following >> ntopng features may also help you: >> >> >> https://www.ntop.org/guides/ntopng/advanced_features/live_pcap_download.html >> >> https://www.ntop.org/guides/ntopng/using_with_other_tools/n2disk.html >> >> https://www.ntop.org/guides/ntopng/historical_flows.html >> >> Regards, >> Emanuele >> On 5/7/20 5:57 PM, Aaron Scamehorn wrote: >> >> Hello, >> >> I'm trying to understand how/why I am getting the "Replies / Requests >> Ratio" warnings for DNS. >> >> I am suspect of these alerts, and would like to know how/why they are >> being generated. I am suspect for for the following reasons: 1) If it >> really is as bad as indicated, I should notice problems. 2) the "events' >> occur immediately after I clear the alerts, and tend to persist for hours. >> >> In any case, I cleared the alerts last night, and this is what they look >> like: >> >> 06/05/2020 22:15:00 12:31:28 Warning Replies / Requests Ratio Host >> edgemax.example.net >> <http://xps-630i.scamlan.net:3000/lua/host_details.lua?ifid=2&host=10.12.17.1@1&page=historical&epoch_begin=1588864588&epoch_end=1588868188> >> has received 54 DNS requests but sent 0 DNS replies [5 Minutes ratio: 0%] >> >> 06/05/2020 22:15:00 12:31:28 Warning Replies / Requests Ratio Host >> pihole.example.net >> <http://xps-630i.scamlan.net:3000/lua/host_details.lua?ifid=2&host=10.12.17.3@1&page=historical&epoch_begin=1588864588&epoch_end=1588868188> >> has sent 93 DNS requests but received 3 DNS replies [5 Minutes ratio: 3.2%] >> >> 06/05/2020 22:15:00 12:31:28 Warning Replies / Requests Ratio Host >> pihole-2.example.net >> <http://xps-630i.scamlan.net:3000/lua/host_details.lua?ifid=2&host=10.12.17.4@1&page=historical&epoch_begin=1588864588&epoch_end=1588868188> >> has sent 97 DNS requests but received 1 DNS reply [5 Minutes ratio: 1.0%] >> >> >> >> >> _______________________________________________ >> Ntop mailing >> listNtop@listgateway.unipi.ithttp://listgateway.unipi.it/mailman/listinfo/ntop >> >> _______________________________________________ >> Ntop mailing list >> Ntop@listgateway.unipi.it >> http://listgateway.unipi.it/mailman/listinfo/ntop > > > _______________________________________________ > Ntop mailing > listNtop@listgateway.unipi.ithttp://listgateway.unipi.it/mailman/listinfo/ntop > > _______________________________________________ > Ntop mailing list > Ntop@listgateway.unipi.it > http://listgateway.unipi.it/mailman/listinfo/ntop
_______________________________________________ Ntop mailing list Ntop@listgateway.unipi.it http://listgateway.unipi.it/mailman/listinfo/ntop