Re: intermittent SERVFAIL for high visible domains such as *.google.com
Brian J. Murrell wrote: > > that demonstrates how BIND is getting .com referrals from the root > servers when doing a query for www.google.com and then doing nothing > with those referrals before returning a SERVFAIL. That indicates that it has already marked the servers as lame, so the packet trace isn't going to tell you what caused the lameness. The thing to look out for is the minutes before the outage starts - see what kind of failures you get. Also, check the logs for EDNS or lame-servers complaints before an outage starts, which I hope will give you a better idea of how long the problem is (e.g. start off around the 10 minute mark suggested by the lame-ttl setting). Good luck :-) Tony. -- f.anthony.n.finchhttp://dotat.at/ - I xn--zr8h punycode South Utsire, Northeast Forties: Southerly or southeasterly 5 to 7, increasing gale 8 at times. Moderate or rough, occasionally very rough in South Utsire. Occasional rain. Good, occasionally poor. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: intermittent SERVFAIL for high visible domains such as *.google.com
On Mon, 2018-01-22 at 12:04 +, Tony Finch wrote: > > That indicates that it has already marked the servers as lame, so the > packet trace isn't going to tell you what caused the lameness. OK. > The thing to look out for is the minutes before the outage starts - > see > what kind of failures you get. > > Also, check the logs for EDNS What do EDNS problem messages look like? Just something to grep for I mean. > or lame-servers complaints Does the "lame:1" in this message indicate lameness: 18-Jan-2018 11:12:47.103 fetch completed at resolver.c:3074 for 149.243.194.103.in-addr.arpa/PTR in 0.000744: failure/success [domain:243.194.103.in-addr.arpa,referral:0,restart:1,qrysent:0,timeout:0,lame:1,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0] Of course, that one is irrelevant to my situation, I'm just using it as an example of how to find lame delegations. Cheers, b. signature.asc Description: This is a digitally signed message part ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: intermittent SERVFAIL for high visible domains such as *.google.com
Brian J. Murrell wrote: > > What do EDNS problem messages look like? Just something to grep for I > mean. They'll have a log category of edns-disabled. But, looking through the code, if this is leading to lameness you will also get lame-servers log messages. > > or lame-servers complaints lame-servers is also a log category, and tends to be quite noisy about various problems :-) > Does the "lame:1" in this message indicate lameness: > > 18-Jan-2018 11:12:47.103 fetch completed at resolver.c:3074 for > 149.243.194.103.in-addr.arpa/PTR in 0.000744: failure/success > [domain:243.194.103.in-addr.arpa,referral:0,restart:1,qrysent:0,timeout:0,lame:1,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0] The tagged values there are various kinds of things that happened when resolving; the lame: tag is a count of the lame servers that were encountered, including both newly discovered lame servers and cached lame servers. The other error-related numbers are worth paying attention to, I think - timeout, neterr, badresp, adberr, findfail, valfail. Tony. -- f.anthony.n.finchhttp://dotat.at/ - I xn--zr8h punycode Tyne, Dogger: West, backing south, 5 or 6, occasionally 7 later. Slight or moderate, occasionally rough. Occasional rain or showers. Good, occasionally poor. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
one domain not resolving via response-policy zone
Hi List, I setup a response-policy zone to override some Records from external DNS-Servers I can't control. My db.rpz Zonefile: $TTL 4H @ IN SOA localhost. kai.mydomain.com. ( 2018012212 ; serial 5M ; refresh 5M ; retry 4W ; expiry 5M) ; minimum IN NS localhost. localhost A 127.0.0.1 ulf.test.google.de A 192.168.0.1 gerd.test.google.de A 192.168.0.2 bild.de A 192.168.0.3 somehost.ov.otto.de A 10.0.0.1 otherhost.ov.otto.de A 10.0.0.2 heise.de A 192.168.0.4 In my options I just added response-policy { zone "rpz"; }; What really drives me crazy is, that the override of the google and heise domain is working. But the otto.de domains not. If I do a nslookup for one of the otto.de domains I reveive "** server can't find somehost.ov.otto.de: SERVFAIL" Any hints for me? Thanks and best regards, Kai ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: intermittent SERVFAIL for high visible domains such as *.google.com
On Mon, 2018-01-22 at 12:45 +, Tony Finch wrote: > > They'll have a log category of edns-disabled. But if the problem were EDNS, would it be so intermittent and always fixable by rndc reload? > But, looking through the > code, if this is leading to lameness you will also get lame-servers > log > messages. So just looking for lame servers will cover EDNS issues also then, right? > lame-servers is also a log category, and tends to be quite noisy > about > various problems :-) Yeah. Must be disabled by default on EL7 I would guess, just because it's so noisy. > The tagged values there are various kinds of things that happened > when > resolving; the lame: tag is a count of the lame servers that were > encountered, including both newly discovered lame servers and cached > lame > servers. So, if lame servers were a problem with resolving ns[1-4].google.com, then I would see messages like in my previous message with a lame:n tag where n > 0, yes? Cheers, b. signature.asc Description: This is a digitally signed message part ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: intermittent SERVFAIL for high visible domains such as *.google.com
On Mon, 2018-01-22 at 12:04 +, Tony Finch wrote: > > The thing to look out for is the minutes before the outage starts - > see > what kind of failures you get. So, taking this approach, looking for the first occurrence of just any one of the names ns[1-4].google.com prior to the A/ queries that are in http://brian.interlinx.bc.ca/named.run.log starting at: 19-Jan-2018 18:04:50.785 createfetch: ns1.google.com A (which end up resulting in the SERVFAIL for www.google.com/IN/A) the first previous occurrence of just any one of those names is: 19-Jan-2018 17:48:59.122 resquery 0x7f10102ecd50 (fctx 0x7f10102e5dc0(lh4.ggpht.com/)): response 19-Jan-2018 17:48:59.122 received packet: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 3024 ;; flags: qr cd; QUESTION: 1, ANSWER: 0, AUTHORITY: 8, ADDITIONAL: 5 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags: do; udp: 4096 ;; QUESTION SECTION: ;lh4.ggpht.com. IN ;; AUTHORITY SECTION: ggpht.com. 172800 IN NS ns2.google.com. ggpht.com. 172800 IN NS ns1.google.com. ggpht.com. 172800 IN NS ns3.google.com. ggpht.com. 172800 IN NS ns4.google.com. CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN NSEC3 1 1 0 - CK0Q1GIN43N1ARRC9OSM6QPQR81H5M9A NS SOA RRSIG DNSKEY NSEC3PARAM CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN RRSIG NSEC3 8 2 86400 20180124054922 20180117043922 46967 com. pjslTFtda4UfkpJtO9rbVmzSRQ+JslWRuBl/r0tkeyX4nBA8wjOIQjCH DJl+C6CA8TMW lO9dfx5ZHM2s59N/XfQG3fp2N68bf3rhSp5OwUEVy205 6LMbiiW7wjp0MEQOGorvf29kS6ApuZHGOseP5HQrAIBO4XxZvomAPME+ Q1c= FGFB71PIIJ5JUGA7GFUQ06ANFUVDRKBA.com. 86400 IN NSEC3 1 1 0 - FGFGQ2SH7LNK03PV0R76S8B47TPVJK59 NS DS RRSIG FGFB71PIIJ5JUGA7GFUQ06ANFUVDRKBA.com. 86400 IN RRSIG NSEC3 8 2 86400 20180125052147 20180118041147 46967 com. DkAophVbTjntmUtcj2HIiigTv5yxlNuTIAGWgXY+W9QhAJp4UUYpqxOe jmyxVEUtfYqS 3ANVWz7EI+ucYS1CE8UKuWUx4eGAz8F/YbN/KA5cvxWO SEqri5Lg3W2MjiB/DXXFI/WrnmuLPNIQdDZD2H1lQ56CTUAL0pPpDby9 788= ;; ADDITIONAL SECTION: ns2.google.com. 172800 IN A 216.239.34.10 ns1.google.com. 172800 IN A 216.239.32.10 ns3.google.com. 172800 IN A 216.239.36.10 ns4.google.com. 172800 IN A 216.239.38.10 I realize this query result has nothing to do with www,google.com, but it is the first occurrence of just any of the names ns[1-4].google.com prior to the start of the subsequent SERFAIL processing that starts at 18:04:50.785 and it's more than 10 minutes prior to the SERVFAIL. That seems to indicate that nothing at all to do with any of the names ns[1-4].google.com happens for more than 10 minutes before a SERVFAIL is returned for www.google.com right? Nothing at all happens that could result in a any of those names being lame, right? Cheers, b. signature.asc Description: This is a digitally signed message part ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: 9.11 can't validate sss.gov
I've informed the selective service (sss.gov) of the issue. They have supposedly passed it on to their "web support group". We will see if anything happens but I'm not holding my breath. At least a government agency should have more influence to get qwest to fix their servers than I do. Timothy A. Holtzen Campus Network Administrator Nebraska Wesleyan University Public PGP key CFB4 3AE8 B726 DEBF 00D9 CCFC 426E 76AF DABC B3D7 On 01/19/2018 05:04 PM, Mark Andrews wrote: > Yes, qwest were informed years ago that there severs are broken. Report this > to the .gov site operators. The servers return BADVERS to the queries which > was never part of the EDNS spec and is a invention of the servers developers. > FORMERR was permissible by STD13 but this was tightened when the EDNS spec > was revised to say ignore unknown EDNS options. > signature.asc Description: OpenPGP digital signature ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: intermittent SERVFAIL for high visible domains such as *.google.com
Brian J. Murrell wrote: > > Yeah. Must be disabled by default on EL7 I would guess, just because > it's so noisy. You should make sure it is enabled, because there are vital clues in those log lines :-) Other categories you should check are `edns-disabled` (which I already mentioned) and `resolver`. > So, if lame servers were a problem with resolving ns[1-4].google.com, > then I would see messages like in my previous message with a lame:n tag > where n > 0, yes? Yes, and you should track down when they occur and look for other error indications areound that time. Tony. -- f.anthony.n.finchhttp://dotat.at/ - I xn--zr8h punycode Bailey, Fair Isle, Faeroes: Cyclonic, becoming south or southwest, 5 to 7, increasing gale 8 at times. Rough or very rough, occasionally high at first. Occasional rain or showers, squally later in Bailey. Good, occasionally poor. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: one domain not resolving via response-policy zone
Hey Kai, > If I do a nslookup for one of the otto.de domains I reveive "** server > can't find somehost.ov.otto.de: SERVFAIL" The guideline behind the response-policy is that only an actual response gets rewritten. This is usually an answer from a recursive lookup. If you don't get an answer, there is nothing to rewrite. The SERVFAIL won't be rewritten unless you told BIND to do so. You could try the 'qname-wait-recurse' option. I guess this isn't the original purpose of this option, but based on the documentation this should work for you. >From https://ftp.isc.org/isc/bind9/cur/9.11/doc/arm/Bv9ARM.ch06.html > Using this option can cause error responses such as SERVFAIL > to appear to be rewritten, since no recursion is being done to > discover problems at the authoritative server. Cheers Felix On 22.01.2018 13:58, Kai Wiechers wrote: > Hi List, > I setup a response-policy zone to override some Records from external > DNS-Servers I can't control. > My db.rpz Zonefile: > $TTL 4H > @ IN SOA localhost. kai.mydomain.com. ( > 2018012212 ; serial > 5M ; refresh > 5M ; retry > 4W ; expiry > 5M) ; minimum > IN NS localhost. > localhost A 127.0.0.1 > ulf.test.google.de A 192.168.0.1 > gerd.test.google.de A 192.168.0.2 > bild.de A 192.168.0.3 > somehost.ov.otto.de A 10.0.0.1 > otherhost.ov.otto.de A 10.0.0.2 > heise.de A 192.168.0.4 > In my options I just added > response-policy { zone "rpz"; }; > What really drives me crazy is, that the override of the google and > heise domain is working. But the otto.de domains not. > If I do a nslookup for one of the otto.de domains I reveive "** server > can't find somehost.ov.otto.de: SERVFAIL" > Any hints for me? > Thanks and best regards, > Kai > ___ > Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe > from this list > bind-users mailing list > bind-users@lists.isc.org > https://lists.isc.org/mailman/listinfo/bind-users ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: 9.11 can't validate sss.gov
Unrelated to the DNS bit, but still silly / annoying: http://www.sss.gov works OK, but http://sss.gov always seems to return "The requested service is temporarily unavailable. It is either overloaded or under maintenance. Please try later.". There is a fair bit os disagreement over if a bare domain should resolve / have a web-server listening, but ISTM that if you *do*, you should have it work -- I wonder how many people have tried the bare domain and never realized that adding in the 'www' "fixes" it. W On Mon, Jan 22, 2018 at 11:08 AM, Timothy A. Holtzen wrote: > I've informed the selective service (sss.gov) of the issue. They have > supposedly passed it on to their "web support group". We will see if > anything happens but I'm not holding my breath. At least a government > agency should have more influence to get qwest to fix their servers than > I do. > > Timothy A. Holtzen > Campus Network Administrator > Nebraska Wesleyan University > Public PGP key CFB4 3AE8 B726 DEBF 00D9 CCFC 426E 76AF DABC B3D7 > > > On 01/19/2018 05:04 PM, Mark Andrews wrote: >> Yes, qwest were informed years ago that there severs are broken. Report this >> to the .gov site operators. The servers return BADVERS to the queries which >> was never part of the EDNS spec and is a invention of the servers >> developers. FORMERR was permissible by STD13 but this was tightened when >> the EDNS spec was revised to say ignore unknown EDNS options. >> > > > > ___ > Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe > from this list > > bind-users mailing list > bind-users@lists.isc.org > https://lists.isc.org/mailman/listinfo/bind-users -- I don't think the execution is relevant when it was obviously a bad idea in the first place. This is like putting rabid weasels in your pants, and later expressing regret at having chosen those particular rabid weasels and that pair of pants. ---maf ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: 9.11 can't validate sss.gov
On 01/22/2018 09:21 AM, Warren Kumari wrote: http://www.sss.gov works OK, but http://sss.gov always seems to return "The requested service is temporarily unavailable. It is either overloaded or under maintenance. Please try later.". Inconsistency between related things is annoying. I guess props for consistently returning different things. There is a fair bit os disagreement over if a bare domain should resolve / have a web-server listening, but ISTM that if you do, you should have it work I agree that this (at the very least) violates (what I consider to be) reasonable expectation and surprises users. I'm of the opinion that if you have www.sss.gov and sss.gov, that they should behave in very similar and related ways. - My personal preference would be for sss.gov to 30[1267] redirect to www.sss.gov. Ideally any non-HTTPS to HTTPS. I wonder how many people have tried the bare domain and never realized that adding in the 'www' "fixes" it. I expect that there are a lot more than we may think. -- Grant. . . . unix || die smime.p7s Description: S/MIME Cryptographic Signature ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: intermittent SERVFAIL for high visible domains such as *.google.com
On Mon, 2018-01-22 at 16:10 +, Tony Finch wrote: > > You should make sure it is enabled, because there are vital clues in > those > log lines :-) But they will only occur if there is some lameness with the ns[1- 4].google.com records and that will already be reported with lame:n in the "fetch completed at resolver.c" lines won't they, or am I completely misunderstanding something here? > Yes, and you should track down when they occur and look for other > error > indications areound that time. So, over the last week of tracing I have only these lines which match "fetch completed at resolver.c:[0-9]* for ns[1-4].google.com": 19-Jan-2018 09:41:53.347 fetch completed at resolver.c:7492 for ns4.google.com/ in 0.042154: success/success [domain:google.com,referral:0,restart:1,qrysent:1,timeout:0,lame:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0] 19-Jan-2018 09:41:53.350 fetch completed at resolver.c:7492 for ns2.google.com/ in 0.042019: success/success [domain:google.com,referral:0,restart:1,qrysent:1,timeout:0,lame:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0] 19-Jan-2018 09:41:53.356 fetch completed at resolver.c:7492 for ns3.google.com/ in 0.043881: success/success [domain:google.com,referral:0,restart:1,qrysent:1,timeout:0,lame:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0] 19-Jan-2018 09:41:53.362 fetch completed at resolver.c:7492 for ns1.google.com/ in 0.047039: success/success [domain:google.com,referral:0,restart:1,qrysent:1,timeout:0,lame:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0] None of them show any lame servers. Wouldn't I see occurrences of those with lame:n if I there were any lameness? Cheers, b. signature.asc Description: This is a digitally signed message part ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users