Jeremy Chadwick wrote:
On Tue, Sep 28, 2010 at 10:59:04PM +0200, Miroslav Lachman wrote:
Jeremy Chadwick wrote:
On Tue, Sep 28, 2010 at 08:12:00PM +0200, Miroslav Lachman wrote:
Hi,
we are using fetch command from cron to run PHP scripts periodically
and sometimes cron sends error e-mails like this:
fetch: https://hiden.example.com/cron/fiveminutes: Non-recoverable
resolver failure
[...]
Note: target domains are hosted on the server it-self and named too.
The system is FreeBSD 7.3-RELEASE-p2 i386 GENERIC
Can somebody help me to diagnose this random fetch+resolver issue?
[...]
There is PF with some basic rules, mostly blocking incomming
packets, allowing all outgoing and scrubbing:
scrub in on bge1 all fragment reassemble
scrub out on bge1 all no-df random-id min-ttl 24 max-mss 1492
fragment reassemble
pass out on bge1 inet proto udp all keep state
pass out on bge1 inet proto tcp from 1.2.3.40 to any flags S/SA
modulate state
pass out on bge1 inet proto tcp from 1.2.3.41 to any flags S/SA
modulate state
pass out on bge1 inet proto tcp from 1.2.3.42 to any flags S/SA
modulate state
modified PF options:
set timeout { frag 15, interval 5 }
set limit { frags 2500, states 5000 }
set optimization aggressive
set block-policy drop
set loginterface bge1
# Let loopback and internal interface traffic flow without restrictions
set skip on lo0
Please also provide "pfctl -s info" output, in addition to uname -a
output (you can hide the hostname), since the pf stack differs depending
on what FreeBSD version you're using.
# pfctl -s info
No ALTQ support in kernel
ALTQ related functions disabled
Status: Enabled for 32 days 11:31:02 Debug: Urgent
Interface Stats for bge1 IPv4 IPv6
Bytes In 37064314787 0
Bytes Out 279633869976 0
Packets In
Passed 214057477 0
Blocked 1180125 0
Packets Out
Passed 272266744 0
Blocked 128777 0
State Table Total Rate
current entries 181
searches 518860439 184.9/s
inserts 16608172 5.9/s
removals 16607991 5.9/s
Counters
match 17951131 6.4/s
bad-offset 0 0.0/s
fragment 23 0.0/s
short 0 0.0/s
normalize 4 0.0/s
memory 0 0.0/s
bad-timestamp 0 0.0/s
congestion 0 0.0/s
ip-option 0 0.0/s
proto-cksum 3095 0.0/s
state-mismatch 16707 0.0/s
state-insert 0 0.0/s
state-limit 0 0.0/s
src-limit 0 0.0/s
synproxy 0 0.0/s
uname:
7.3-RELEASE-p2 FreeBSD 7.3-RELEASE-p2 #0: Mon Jul 12 19:04:04 UTC 2010
r...@i386-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC i386
Things that catch my eye as potential problems -- I don't have a way to
confirm these are responsible for your issue (DNS resolver lookups are
UDP-based, not TCP), but I want to point them out anyway.
1) "modulate state" is broken on FreeBSD. Taken from our pf.conf notes:
# Filtering (public interface only; see "set skip")
#
# NOTE: Do not use "modulate state", as it's known to be broken on FreeBSD.
# http://lists.freebsd.org/pipermail/freebsd-pf/2008-March/004227.html
2) "optimization aggressive" sounds dangerous given what pf.conf(5) says
about it. I'd like to know what it considers "idle".
3) I would also remove many of the options you have set in your "scrub
out" rule. Starting with a clean slate to see if things improve is
probably a good idea. As you'll see below, sometimes pf does things
which may be correct per IP specification but don't work quite right
with other vendors' IP stacks.
4) Your "set timeout" values look to be extreme. I would recommend
leaving these at their defaults given your situation.
5) This feature is not in use in your pf.conf, but I want to point out
regardless. "reassemble tcp" is also broken in some way. Again taken
from our pf.conf notes:
# Normalization -- resolve/reduce traffic ambiguities.
#
# NOTE: Do NOT use 'reassemble tcp' as it definitely causes breakage.
# Issue may be related to other vendors' IP stacks, so let's leave it
# disabled.
Thank you for all your hints about PF! Maybe it's time to consider
refactoring our standard pf.conf which was made years ago...
The original problem seems to be problem of how resolver on FreeBSD 7.3
works. This machine was upgraded from 7.2 few weeks ago and we had not
this problem before.
I added '|| dig hiden.example.com' to the crontab so I get dig output in
the case of fetch failure:
*/5 * * * * fetch -qo /dev/null
"https://hiden.example.com/cron/fiveminutes" || dig hiden.example.com
The domain has TTL set to 360 seconds and each fetch "Non-recoverable
resolver failure" is exactly in the time when TTL was expired and new
query to authoritative nameservers must be done:
; <<>> DiG 9.4.-ESV <<>> hiden.example.com
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30191
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 2, ADDITIONAL: 0
;; QUESTION SECTION:
;hiden.example.com. IN A
;; ANSWER SECTION:
hiden.example.com. 360 IN CNAME server.example.com.
server.example.com. 360 IN A 1.2.3.49
;; AUTHORITY SECTION:
example.com. 224 IN NS ns1.ignum.com.
example.com. 224 IN NS ns2.ignum.cz.
;; Query time: 395 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Sep 30 11:30:16 2010
;; MSG SIZE rcvd: 135
Note: real domains and IPs were replaced with example.com / 1.2.3.49
I made some easy script to run dig queries to affected domains each 3
minutes from cron with logging to file. The script is in use for one day
and did not log any error response (resolving by dig command works fine)
and we got only one occurence of fetch "Non-recoverable resolver
failure" in the time when cached DNS entry expired (the above one), this
is coincidence where diq query from script was made in the same time as
fetch job. The same DNS answere was e-mailed from cron and loggend in to
file by the script.
So my thought is that DNS cache server (locally running BIND) is working
fine, authoritative nameservers too, but resolving the domain for the
first time and passing the reply to the fetch fails for unknown reason.
I will try to use curl or wget instead of fetch to see if the symptoms
persist or not.
Miroslav Lachman
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"