Hi Emanuele,

Thank you again for the detailed responses.

>From the interfaces page, I see these stats:
Total Traffic 91.6 GB [103,062,265 Pkts] Dropped Packets 0 Pkts
I don't see any dropped packets on the NIC either:
ethtool -S enp2s0
NIC statistics:
     tx_packets: 0
     rx_packets: 106581943
     tx_errors: 0
     rx_errors: 0
     rx_missed: 0
     align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     unicast: 105432876
     broadcast: 350738
     multicast: 1149060
     tx_aborted: 0
     tx_underrun: 0

As of right now, 2 of the hosts we are discussing are still in alert, at
the original Date/Time of 07:25:01, and Duration is now "3 Days, 08:06:59".

Given that my replies vs requests ratio is still configured at 50%, this
means that, at every 5 minute interval for the last 3 Days, 8 hours, said
host is receiving < 50% DNS replies, correct?  I find this difficult to
believe, and cannot find ANY missing packets in my pcap file.

I have captured a 30 minute pcap file captured with this command:
tcpdump -i enp2s0 -G 1800 -w /tmp/enp2s0.%FT%T.pcap host edgemax and port 53

This file contains DNS traffic to/from edgemax only.
I can count responses like this:
tshark -t a -r enp2s0.2020-05-11T13:00:02.pcap | grep -c "Standard query
response"
349
And queries like this:
tshark -t a -r enp2s0.2020-05-11T13:00:02.pcap | grep -c "Standard query 0x"
349

In other words, no missing DNS responses in the 30 minutes spanning
13:00:02 to 13:29:51.

I would think that the alert should "clear" because the threshold is not
exceeded within that 30 minute pcap file.

In any case, at 13:23, I manually click on the "Release" button for that
alert.  2 minutes later, at 13:25:00, I receive this alert:
Host edgemax has received 62 DNS requests but sent 0 DNS replies [5 Minutes
ratio: 0%]

As stated previously, no missing DNS responses in the 30 minutes spanning
13:00:02 to 13:29:51.  Why does ntopng think 62 replies are missing?

I exported 10 minutes of PCAP from if_stats.lua.  Using the filter
"(ip.dst_host == "10.12.17.1" or ip.src_host == "10.12.17.1") and dns" I am
not able to find any missing DNS responses in wireshark.  Interestingly, If
I specify a BPF Filter ("port 53"), the downloaded PCAP file seems to only
have 1 side (ie. edgemax is only a source, never a dest.  Without a BPF
Filter, the download is fine.




On Mon, May 11, 2020 at 8:59 AM Emanuele Faranda <fara...@ntop.org> wrote:

> Hi Aaron,
>
> Please see below:
> On 5/8/20 10:27 PM, Aaron Scamehorn wrote:
>
> Thank you for your response.  In the screenshot below, can you please
> explain the significance of the "Date/Time" and the "Duration" columns?
> What do they mean in this context?
>
> Date/Time: the time when the alert was triggered. Ntopng performs periodic
> checks in order to trigger alerts. In this particular case, the check on
> the requests/reply ratio is performed every 5 minutes. So this means that
> problem started between 07:20 and 07:25 .
>
> Duration: the total time in which the problem was active. Again, the check
> is performed every 5 minutes for this alert so 5 minutes is the granularity.
>
>
> Do I understand correctly that all 3 hosts triggered the alert at 07:25:01
> (OR 07:30:01) this morning?  And that all three alerts are active for the
> past 07:28:53  hours?   Does this mean that there have been no new
> additional DNS Reply/Request issues have been detected?
>
> As explained above, the problem started between 07:20 and 07:25 . For
> 07:28:53 hours the problem was active on all the three hosts (the
> requests/reply ratio threshold was exceeded for 07:28:53 hours).
>
>
> I notice in "Past Alerts" tab, that there are many Reply/Request Alerts
> for the same host with very short durations (screen shot #2).  When/how
> does an alert move from the "Engaged" to "Past" tab?
>
> In this case, the engaged alert becomes "past" alert when, after the check
> performed every 5 minutes, the requests/reply ratio threshold is not exceed
> anymore. This can happen as soon as the next check is performed (5 minutes).
>
>
> So in the 2nd screenshot, fire-TV had an alert at 06:20:00 for 05:00
> minutes where 18 requests received 0 replies.  Then another alert at
> 06:50:00 for 05:00 minutes.  Were the 18 replies from the first alert
> ultimately received?  And they were received 5 minutes the alert occurred?
>
> The check is performed on the DNS packet counters. A DNS request cannot
> take 5 minutes to be replied. The fact that the alert was closed after 5/10
> minutes could be related to one of these events:
>
> - The host went idle
>
> - The host did not send enough DNS requests
>
> - The new DNS requests made by the host were successfully replied.
>
>
> Context here is that 99% of the traffic is Internet traffic.  Almost all
> of the pihole traffic is to forwarders.  BTW, the way pihole works (by
> default) is it replies 0.0.0.0 for blocked hosts.  It should respond to
> every query.
>
> I tried the live_pcap_download.html
> <https://www.ntop.org/guides/ntopng/advanced_features/live_pcap_download.html>
> lua, but couldn't figure out the bpf_filter:
> curl --cookie "user=admin; password=xxxxx"  "
> http://10.12.17.25:3000/lua/live_traffic.lua?ifid=0&duration=600&bpf_filter=\"port
> 53\""
>
> I also tried the download pcap on the if_stats.lua page.   The downloaded
> pcap file seems to only contain incoming data (see wireshark)?
>
> This is consistent with the above alerts, please ensure that ntopng is not
> dropping packets as this would explain this behavior.
>
>
> If I just do a tshark on the same interface that ntopng is listening on, I
> see all of the expected DNS query & replies.  I am not able to correlate
> the alerts to any missing packets.
>
> See response above.
>
> Regards,
>
> Emanuele
>
>
>
>
> On Fri, May 8, 2020 at 2:53 AM Emanuele Faranda <fara...@ntop.org> wrote:
>
>> Hi Aaron,
>>
>> The alerts that you are reporting basically tell you that such hosts
>> receive DNS requests but do not send a reply. In order to troubleshoot
>> possible problems you should augment such information with the knowledge of
>> your network.
>>
>> The first question to answer is, are that hosts expected to accept DNS
>> requests? If not, are the requests generated from the internet or from the
>> LAN? In the first case a firewall to block such DNS requests may be a good
>> idea . In the latter case some hosts in the LAN may be misconfigured. In
>> case of the pihole hosts, I expect pihole to block some DNS requests for
>> advertisement sites so this could be a normal behaviour. The following
>> ntopng features may also help you:
>>
>>
>> https://www.ntop.org/guides/ntopng/advanced_features/live_pcap_download.html
>>
>>     https://www.ntop.org/guides/ntopng/using_with_other_tools/n2disk.html
>>
>>     https://www.ntop.org/guides/ntopng/historical_flows.html
>>
>> Regards,
>> Emanuele
>> On 5/7/20 5:57 PM, Aaron Scamehorn wrote:
>>
>> Hello,
>>
>> I'm trying to understand how/why I am getting the "Replies / Requests
>> Ratio" warnings for DNS.
>>
>> I am suspect of these alerts, and would like to know how/why they are
>> being generated.  I am suspect for for the following reasons:  1) If it
>> really is as bad as indicated, I should notice problems.  2) the "events'
>> occur immediately after I clear the alerts, and tend to persist for hours.
>>
>> In any case, I cleared the alerts last night, and this is what they look
>> like:
>>
>> 06/05/2020 22:15:00 12:31:28 Warning Replies / Requests Ratio   Host
>> edgemax.example.net
>> <http://xps-630i.scamlan.net:3000/lua/host_details.lua?ifid=2&host=10.12.17.1@1&page=historical&epoch_begin=1588864588&epoch_end=1588868188>
>> has received 54 DNS requests but sent 0 DNS replies [5 Minutes ratio: 0%]
>>
>> 06/05/2020 22:15:00 12:31:28 Warning Replies / Requests Ratio   Host
>> pihole.example.net
>> <http://xps-630i.scamlan.net:3000/lua/host_details.lua?ifid=2&host=10.12.17.3@1&page=historical&epoch_begin=1588864588&epoch_end=1588868188>
>> has sent 93 DNS requests but received 3 DNS replies [5 Minutes ratio: 3.2%]
>>
>> 06/05/2020 22:15:00 12:31:28 Warning Replies / Requests Ratio   Host
>> pihole-2.example.net
>> <http://xps-630i.scamlan.net:3000/lua/host_details.lua?ifid=2&host=10.12.17.4@1&page=historical&epoch_begin=1588864588&epoch_end=1588868188>
>> has sent 97 DNS requests but received 1 DNS reply [5 Minutes ratio: 1.0%]
>>
>>
>>
>>
>> _______________________________________________
>> Ntop mailing 
>> listNtop@listgateway.unipi.ithttp://listgateway.unipi.it/mailman/listinfo/ntop
>>
>> _______________________________________________
>> Ntop mailing list
>> Ntop@listgateway.unipi.it
>> http://listgateway.unipi.it/mailman/listinfo/ntop
>
>
> _______________________________________________
> Ntop mailing 
> listNtop@listgateway.unipi.ithttp://listgateway.unipi.it/mailman/listinfo/ntop
>
> _______________________________________________
> Ntop mailing list
> Ntop@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop
_______________________________________________
Ntop mailing list
Ntop@listgateway.unipi.it
http://listgateway.unipi.it/mailman/listinfo/ntop

Reply via email to