I've been seeing some issues which I believe to be related to
dns/resolving. The short of it is that the results of
# dig web.whatsapp.com
start out as:
; <<>> DiG 9.4.2-P2 <<>> web.whatsapp.com
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57665
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;web.whatsapp.com. IN A
;; ANSWER SECTION:
web.whatsapp.com. 3595 IN CNAME mmx-ds.cdn.whatsapp.net.
mmx-ds.cdn.whatsapp.net. 55 IN A 31.13.70.49
;; Query time: 6 msec
;; SERVER: 192.168.254.254#53(192.168.254.254)
;; WHEN: Sun Sep 15 14:46:24 2019
;; MSG SIZE rcvd: 87
which seems reasonable (and functional), but then soon become:
;; Warning: Message parser reports malformed message packet.
; <<>> DiG 9.4.2-P2 <<>> web.whatsapp.com
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40939
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;web.whatsapp.com. IN A
;; ANSWER SECTION:
web.whatsapp.com. 3528 IN CNAME mmx-ds.cdn.whatsapp.net.
mmx-ds.cdn.whatsapp.net. 30772 RESERVED0 A \# 4 1F0D4631
;; Query time: 2 msec
;; SERVER: 192.168.254.254#53(192.168.254.254)
;; WHEN: Sun Sep 15 14:47:31 2019
;; MSG SIZE rcvd: 87
At which point I am no longer able to access web.whatsapp.com. Given
that whatsapp is a facebook property, I tried the above against
facebook.com, www.facebook.com, instagram.com, and www.instagram.com as
well. With the exception of instagram.com, the other three (facebook,
www.facebook, www.instagram) return a hex (?) formatted version of the
IP address, similar to what is seen in the later of the above examples.
My thinking is (or was) that there are some issues relating to fb's DNS.
From outside of my network, however, other resolvers seem to be able to
continually resolve the above names correctly. I don't know what those
resolvers are, but specifically I am referring to whatever Linode and
DigitalOcean use in the nameservers they provide to their basic Linux
vms (I am using the default network config in my vms at Linode and
DigitalOcean). I have a suspicion that Linode uses unbound, but I do
not know how to verify that. Oh, as far as I can tell, those
facebook-family names *seem* to be the only names for which I see this
behavior -- all other names that I have tried to run through dig (and
nslookup) seem to return reasonable and seemingly correct results.
A bit about my (home) network. I have Cox cable internet service, an
Arris SBG7580-AC, and an OpenBSD 6.5 machine that sits between the modem
and the rest of the network. I(we) do use the modem in router mode (but
without using the built-in WiFi) as my wife's work git-up consists of a
pre-configured black-box of a Juniper device. Not wanting that device
in the rest of our network, I set the modem to "RoutedWithNAT" and the
two network devices plug into the modem, but provide two separate
networks. For remote ingress into the rest of the network, I set the
modem's DMZ to point to the OpenBSD box. My pf.conf does the usual
small network stuff including NAT, a bit of redirection, etc. It has
changed very little in the past several years. My unbound.conf is also
nearly unchanged since I first set it up when OpenBSD dropped bind and
replaced it with unbound. My OpenBSD machine provides name resolving
for the rest of the network. My unbound.conf follows:
server:
interface: 0.0.0.0
interface: ::1
do-ip6: no
access-control: 0.0.0.0/0 refuse
access-control: 127.0.0.0/8 allow
access-control: 192.168.0.0/16 allow
access-control: 10.0.0.0/24 allow
access-control: 172.16.0.0/24 allow
access-control: ::0/0 refuse
access-control: ::1 allow
hide-identity: yes
hide-version: yes
# ftp://FTP.INTERNIC.NET/domain/named.cache
root-hints: "/var/unbound/etc/named.cache"
# uncomment to enable DNSSEC
auto-trust-anchor-file: "/var/unbound/db/root.key"
### various local-zone, local-data, and local-date-ptr ###
remote-control:
control-enable: yes
control-use-cert: yes
control-interface: /var/run/unbound.sock
do-ip6, root-hints, and auto-trust-anchor-file are somewhat recent
additions to my unbound.conf, but I experience the same behavior with
unbound.conf as above, and also when I comment out those three additions
(bringing it back to a configuration that has worked for several years).
My OpenBSD machine is an APU2 which I have been using without issue for
over a year. My backup machine is an ALIX2D3 I think it is called.
Other than the APU running amd64, and the ALIX running i386, the
machines are otherwise configured exactly the same. The APU2 has been
consistently maintained, and this behavior did start soon after I
applied the libexpat update via syspatch. The ALIX machine, however,
has not been patched (meaning it contains 6.5 as it was at release). I
do not know much about the inner workings of DNS, and thinking that,
perhaps, the packets contain XML and that the recent libexpat update is
causing issues, I backed the update out of the APU2, but still get the
same results. Similarly, swapping the (non-updated) ALIX in place of
the APU2 results in the same behavior.
Please forgive my verbosity, but I figured more info is probably better
than less. My knowledge of DNS and other network services is limited --
I hope I have explained this in a way that can be understood.
Thanks,
Joe