Re: fetch: Non-recoverable resolver failure

Miroslav Lachman Tue, 28 Sep 2010 14:00:23 -0700

Jeremy Chadwick wrote:

On Tue, Sep 28, 2010 at 08:12:00PM +0200, Miroslav Lachman wrote:

Hi,


we are using fetch command from cron to run PHP scripts periodically
and sometimes cron sends error e-mails like this:

fetch: https://hiden.example.com/cron/fiveminutes: Non-recoverable
resolver failure


[...]

Note: target domains are hosted on the server it-self and named too.

The system is FreeBSD 7.3-RELEASE-p2 i386 GENERIC

Can somebody help me to diagnose this random fetch+resolver issue?


The error in question comes from the resolver library returning
EAI_FAIL.  This return code can be returned to all sorts of applications
(not just fetch), although how each app handles it may differ.  So,
chances are you really do have something going on upstream from you (one
of the nameservers you use might not be available at all times), and it
probably clears very quickly (before you have a chance to
manually/interactively investigate it).

The strange thing is that I have only one nameserver listed inresolv.conf and it is the local one! (127.0.0.1) (there were two"remote" nameservers, but I tried to switch to local one to rule outremote nameservers / network problems)

You're probably going to have to set up a combination of scripts that do
tcpdump logging, and ktrace -t+ -i (and probably -a) logging (ex. ktrace
-t+ -i -a -f /var/log/ktrace.fetch.out fetch -qo ...) to find out what's
going on behind the scenes.  The irregularity of the problem (re:
"sometimes") warrants such.  I'd recommend using something other than
127.0.0.1 as your resolver if you need to do tcpdump.

I will try it... there will be a lot of output as there are manycronjobs and relativelly high traffic on the webserver. But fetchresolver failure occurred only few times a day.

Providing contents of your /etc/resolv.conf, as well as details about
your network configuration on the machine (specifically if any
firewall stacks (pf or ipfw) are in place) would help too.  Some folks
might want netstat -m output as well.

There is nothing special in the network, the machine is Sun Fire X2100M2 with bge1 NIC connected to Cisco Linksys switch (100Mbps port) withuplink (1Gbps port) connected to Cisco router with dual 10Gbpsconnectivity. No firewalls in the path. There are more than 10 otherservers in the rack and we have no problems / error messages in logsfrom other services / daemons related to DNS.


# cat /etc/resolv.conf
nameserver 127.0.0.1


/# netstat -m
279/861/1140 mbufs in use (current/cache/total)
257/553/810/25600 mbuf clusters in use (current/cache/total/max)
257/313 mbuf+clusters out of packet secondary zone in use (current/cache)

5/306/311/12800 4k (page size) jumbo clusters in use(current/cache/total/max)

0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
603K/2545K/3149K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
13/470/6656 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
3351782 requests for I/O initiated by sendfile
0 calls to protocol drain routines


(real IPs were replaced)

# ifconfig bge1
bge1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
        ether 00:1e:68:2f:71:ab
        inet 1.2.3.40 netmask 0xffffff80 broadcast 1.2.3.127
        inet 1.2.3.41 netmask 0xffffffff broadcast 1.2.3.41
        inet 1.2.3.42 netmask 0xffffffff broadcast 1.2.3.42
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active


NIC is:

b...@pci0:6:4:1: class=0x020000 card=0x534c108e chip=0x167814e4rev=0xa3 hdr=0x00

    vendor     = 'Broadcom Corporation'
    device     = 'BCM5715C 10/100/100 PCIe Ethernet Controller'
    class      = network
    subclass   = ethernet

There is PF with some basic rules, mostly blocking incomming packets,allowing all outgoing and scrubbing:


scrub in on bge1 all fragment reassemble

scrub out on bge1 all no-df random-id min-ttl 24 max-mss 1492 fragmentreassemble


pass out on bge1 inet proto udp all keep state

pass out on bge1 inet proto tcp from 1.2.3.40 to any flags S/SA modulatestatepass out on bge1 inet proto tcp from 1.2.3.41 to any flags S/SA modulatestatepass out on bge1 inet proto tcp from 1.2.3.42 to any flags S/SA modulatestate


modified PF options:

set timeout { frag 15, interval 5 }
set limit { frags 2500, states 5000 }
set optimization aggressive
set block-policy drop
set loginterface bge1
# Let loopback and internal interface traffic flow without restrictions
set skip on lo0


Thank you for your suggestions

Miroslav Lachman
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: fetch: Non-recoverable resolver failure

Reply via email to