I’ve deployed Cfengine 3.0.5p1 across 800 hosts. I only have an issue with the Cfengine daemons on 1 box where it appears I am hitting a bug. On this machine, it spins a single core to 100% user space CPU utilization. Here are the details.
$ /var/cfengine/bin/cf-agent -v .... ... f3 ------------------------------------------------------------------------ cf3 # Extended system discovery is only available in version Nova and above cf3 Additional hard class defined as: 32_bit cf3 Additional hard class defined as: sunos_5_10 cf3 Additional hard class defined as: sunos_i86pc cf3 Additional hard class defined as: sunos_i86pc_5_10 cf3 Additional hard class defined as: i386 cf3 Additional hard class defined as: i86pc cf3 GNU autoconf class from compile time: compiled_on_solaris2_10 cf3 Address given by nameserver: 172.17.134.80 cf3 Interface 1: lo0 cf3 Interface 2: e1000g0 cf3 Adding alias loghost.. cf3 !! Cannot discover hardware IP, using DNS value ^C So at the “cannot discover hardware IP” point, it hangs and spins the CPU to 100%. Looking at prstat –Lm output below: $ prstat -Lm PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 16398 root 100 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0 190 0 0 cf-agent/1 Putting cf-agent into super debug mode, I see this.... Broken host: $ /var/cfengine/bin/cf-agent –ddd .... .... GetVariable(sys,ipv4_1[172_17_134_80]) type=(to be determined) IsExpandable(ipv4_1[172_17_134_80]) - syntax verify Found 0 variables in (ipv4_1[172_17_134_80]) Looking for sys.ipv4_1[172_17_134_80] Searching for scope context sys Found scope reference sys GetVariable(sys,ipv4_1[172_17_134_80]): using scope 'sys' for variable 'ipv4_1[172_17_134_80]' At which point, cf-agent hangs. Comparing this to a working host, this is what I see. Working host: GetVariable(sys,ipv4_1[172_17_134_81]) type=(to be determined) IsExpandable(ipv4_1[172_17_134_81]) - syntax verify Found 0 variables in (ipv4_1[172_17_134_81]) Looking for sys.ipv4_1[172_17_134_81] Searching for scope context sys Found scope reference sys GetVariable(sys,ipv4_1[172_17_134_81]): using scope 'sys' for variable 'ipv4_1[172_17_134_81]' No such variable found sys.ipv4_1[172_17_134_81] AddVariableHash(sys.ipv4_1[172_17_134_81]=172 (string) rtype=s) Searching for scope context sys Found scope reference sys CopyRvalItem(s) ScanScalar([172]) DeleteRvalItem(l) DeleteRval NULL DeleteRvalItem(l) DeleteRval NULL Added Variable ipv4_1[172_17_134_81] at hash address 60 in scope sys with value (omitted) Trying to locate my IPv6 address Unappending Trying to locate my IPv6 address Unix_cf_popen(/sbin/ifconfig -a) Unix_cf_pclose(pp) cf_pwait - Waiting for process 12411 Looking for environment from cf-monitor... Unappending Looking for environment from cf-monitor... Searching for scope context mon Found scope reference mon No variable matched NewScalar(mon,env_time,Sat Nov 20 00:28:23 2010) So the broken host never gets to the “No such variable found sys.ipv4_1[172_17_134_80]” statement. So, I know this is a problem with Cfengine parsing the network interfaces. The only thing, is I can not see a difference at all between the working and non-working machines. Broken machine’s ifconfig output: $ ifconfig -a lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 e1000g0: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500 index 2 inet 172.17.134.80 netmask ffffff00 broadcast 172.17.134.255 groupname primary ether 0:14:4f:9e:cf:fe e1000g0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2 inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255 e1000g1: flags=69000842<BROADCAST,RUNNING,MULTICAST,IPv4,NOFAILOVER,STANDBY,INACTIVE> mtu 0 index 3 inet 0.0.0.0 netmask 0 groupname primary ether 0:14:4f:9e:cf:ff Working machine’s ifconfig output $ ifconfig -a lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 e1000g0: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500 index 2 inet 172.17.134.81 netmask ffffff00 broadcast 172.17.134.255 groupname primary ether 0:14:4f:83:31:ac e1000g0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2 inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255 e1000g1: flags=69000842<BROADCAST,RUNNING,MULTICAST,IPv4,NOFAILOVER,STANDBY,INACTIVE> mtu 0 index 3 inet 0.0.0.0 netmask 0 groupname primary ether 0:14:4f:83:31:ad So other than the inet address of e1000g0 and the ethernet addresses, the output is exactly the same. If I unplumb the interfaces e1000g0:1 and e1000g1 on the broken machine, the Cfengine daemons operate again. Has anyone run into this bug before, or can help suggest anything? Thanks! Mike
_______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine