Hi Mike,
Yes, I've got this behaviour in two solaris zones in a large roll-out. I'm on cfengine 3.0.4, Solaris 10. Hangs during network discovery and just spins. Have yet to figure out what's going on with it. Similarly, can't see what's different about the network setup for these two compared to the other ~300 servers. Interested in solutions or suggestions to collect more debug info. Simon From: help-cfengine-boun...@cfengine.org [mailto:help-cfengine-boun...@cfengine.org] On Behalf Of Mike Svoboda Sent: 20 November 2010 00:34 To: help-cfengine@cfengine.org Subject: Cfengine 3.0.5p1 daemons spinning CPU to 100% on 1 host out of 800 I've deployed Cfengine 3.0.5p1 across 800 hosts. I only have an issue with the Cfengine daemons on 1 box where it appears I am hitting a bug. On this machine, it spins a single core to 100% user space CPU utilization. Here are the details. $ /var/cfengine/bin/cf-agent -v .... ... f3 ------------------------------------------------------------------------ cf3 # Extended system discovery is only available in version Nova and above cf3 Additional hard class defined as: 32_bit cf3 Additional hard class defined as: sunos_5_10 cf3 Additional hard class defined as: sunos_i86pc cf3 Additional hard class defined as: sunos_i86pc_5_10 cf3 Additional hard class defined as: i386 cf3 Additional hard class defined as: i86pc cf3 GNU autoconf class from compile time: compiled_on_solaris2_10 cf3 Address given by nameserver: 172.17.134.80 cf3 Interface 1: lo0 cf3 Interface 2: e1000g0 cf3 Adding alias loghost.. cf3 !! Cannot discover hardware IP, using DNS value ^C So at the "cannot discover hardware IP" point, it hangs and spins the CPU to 100%. Looking at prstat -Lm output below: $ prstat -Lm PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 16398 root 100 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0 190 0 0 cf-agent/1 Putting cf-agent into super debug mode, I see this.... Broken host: $ /var/cfengine/bin/cf-agent -ddd .... .... GetVariable(sys,ipv4_1[172_17_134_80]) type=(to be determined) IsExpandable(ipv4_1[172_17_134_80]) - syntax verify Found 0 variables in (ipv4_1[172_17_134_80]) Looking for sys.ipv4_1[172_17_134_80] Searching for scope context sys Found scope reference sys GetVariable(sys,ipv4_1[172_17_134_80]): using scope 'sys' for variable 'ipv4_1[172_17_134_80]' At which point, cf-agent hangs. Comparing this to a working host, this is what I see. Working host: GetVariable(sys,ipv4_1[172_17_134_81]) type=(to be determined) IsExpandable(ipv4_1[172_17_134_81]) - syntax verify Found 0 variables in (ipv4_1[172_17_134_81]) Looking for sys.ipv4_1[172_17_134_81] Searching for scope context sys Found scope reference sys GetVariable(sys,ipv4_1[172_17_134_81]): using scope 'sys' for variable 'ipv4_1[172_17_134_81]' No such variable found sys.ipv4_1[172_17_134_81] AddVariableHash(sys.ipv4_1[172_17_134_81]=172 (string) rtype=s) Searching for scope context sys Found scope reference sys CopyRvalItem(s) ScanScalar([172]) DeleteRvalItem(l) DeleteRval NULL DeleteRvalItem(l) DeleteRval NULL Added Variable ipv4_1[172_17_134_81] at hash address 60 in scope sys with value (omitted) Trying to locate my IPv6 address Unappending Trying to locate my IPv6 address Unix_cf_popen(/sbin/ifconfig -a) Unix_cf_pclose(pp) cf_pwait - Waiting for process 12411 Looking for environment from cf-monitor... Unappending Looking for environment from cf-monitor... Searching for scope context mon Found scope reference mon No variable matched NewScalar(mon,env_time,Sat Nov 20 00:28:23 2010) So the broken host never gets to the "No such variable found sys.ipv4_1[172_17_134_80]" statement. So, I know this is a problem with Cfengine parsing the network interfaces. The only thing, is I can not see a difference at all between the working and non-working machines. Broken machine's ifconfig output: $ ifconfig -a lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 e1000g0: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500 index 2 inet 172.17.134.80 netmask ffffff00 broadcast 172.17.134.255 groupname primary ether 0:14:4f:9e:cf:fe e1000g0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2 inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255 e1000g1: flags=69000842<BROADCAST,RUNNING,MULTICAST,IPv4,NOFAILOVER,STANDBY,INACT IVE> mtu 0 index 3 inet 0.0.0.0 netmask 0 groupname primary ether 0:14:4f:9e:cf:ff Working machine's ifconfig output $ ifconfig -a lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 e1000g0: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500 index 2 inet 172.17.134.81 netmask ffffff00 broadcast 172.17.134.255 groupname primary ether 0:14:4f:83:31:ac e1000g0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2 inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255 e1000g1: flags=69000842<BROADCAST,RUNNING,MULTICAST,IPv4,NOFAILOVER,STANDBY,INACT IVE> mtu 0 index 3 inet 0.0.0.0 netmask 0 groupname primary ether 0:14:4f:83:31:ad So other than the inet address of e1000g0 and the ethernet addresses, the output is exactly the same. If I unplumb the interfaces e1000g0:1 and e1000g1 on the broken machine, the Cfengine daemons operate again. Has anyone run into this bug before, or can help suggest anything? Thanks! Mike
_______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine