Hi, Mike. Is it workable to suggest upgrading to 3.1.0? Yours, -at
On Fri, Nov 19, 2010 at 4:33 PM, Mike Svoboda <msvob...@linkedin.com> wrote: > I’ve deployed Cfengine 3.0.5p1 across 800 hosts. I only have an issue with > the Cfengine daemons on 1 box where it appears I am hitting a bug. On this > machine, it spins a single core to 100% user space CPU utilization. Here > are the details. > > > $ /var/cfengine/bin/cf-agent -v > .... > ... > f3 ------------------------------------------------------------------------ > cf3 # Extended system discovery is only available in version Nova and above > cf3 Additional hard class defined as: 32_bit > cf3 Additional hard class defined as: sunos_5_10 > cf3 Additional hard class defined as: sunos_i86pc > cf3 Additional hard class defined as: sunos_i86pc_5_10 > cf3 Additional hard class defined as: i386 > cf3 Additional hard class defined as: i86pc > cf3 GNU autoconf class from compile time: compiled_on_solaris2_10 > cf3 Address given by nameserver: 172.17.134.80 > cf3 Interface 1: lo0 > cf3 Interface 2: e1000g0 > cf3 Adding alias loghost.. > cf3 !! Cannot discover hardware IP, using DNS value > ^C > > > So at the “cannot discover hardware IP” point, it hangs and spins the CPU to > 100%. Looking at prstat –Lm output below: > > > $ prstat -Lm > PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG > PROCESS/LWPID > 16398 root 100 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0 190 0 0 cf-agent/1 > > > Putting cf-agent into super debug mode, I see this.... > > Broken host: > $ /var/cfengine/bin/cf-agent –ddd > .... > .... > GetVariable(sys,ipv4_1[172_17_134_80]) type=(to be determined) > IsExpandable(ipv4_1[172_17_134_80]) - syntax verify > Found 0 variables in (ipv4_1[172_17_134_80]) > Looking for sys.ipv4_1[172_17_134_80] > Searching for scope context sys > Found scope reference sys > GetVariable(sys,ipv4_1[172_17_134_80]): using scope 'sys' for variable > 'ipv4_1[172_17_134_80]' > > > > At which point, cf-agent hangs. Comparing this to a working host, this is > what I see. > > Working host: > GetVariable(sys,ipv4_1[172_17_134_81]) type=(to be determined) > IsExpandable(ipv4_1[172_17_134_81]) - syntax verify > Found 0 variables in (ipv4_1[172_17_134_81]) > Looking for sys.ipv4_1[172_17_134_81] > Searching for scope context sys > Found scope reference sys > GetVariable(sys,ipv4_1[172_17_134_81]): using scope 'sys' for variable > 'ipv4_1[172_17_134_81]' > No such variable found sys.ipv4_1[172_17_134_81] > AddVariableHash(sys.ipv4_1[172_17_134_81]=172 (string) rtype=s) > Searching for scope context sys > Found scope reference sys > CopyRvalItem(s) > ScanScalar([172]) > DeleteRvalItem(l) > DeleteRval NULL > DeleteRvalItem(l) > DeleteRval NULL > Added Variable ipv4_1[172_17_134_81] at hash address 60 in scope sys with > value (omitted) > Trying to locate my IPv6 address > Unappending Trying to locate my IPv6 address > Unix_cf_popen(/sbin/ifconfig -a) > Unix_cf_pclose(pp) > cf_pwait - Waiting for process 12411 > Looking for environment from cf-monitor... > Unappending Looking for environment from cf-monitor... > Searching for scope context mon > Found scope reference mon > No variable matched > NewScalar(mon,env_time,Sat Nov 20 00:28:23 2010) > > > So the broken host never gets to the “No such variable found > sys.ipv4_1[172_17_134_80]” statement. > > So, I know this is a problem with Cfengine parsing the network interfaces. > The only thing, is I can not see a difference at all between the working > and non-working machines. > > > Broken machine’s ifconfig output: > $ ifconfig -a > lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 > index 1 > inet 127.0.0.1 netmask ff000000 > e1000g0: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu > 1500 index 2 > inet 172.17.134.80 netmask ffffff00 broadcast 172.17.134.255 > groupname primary > ether 0:14:4f:9e:cf:fe > e1000g0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index > 2 > inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255 > e1000g1: > flags=69000842<BROADCAST,RUNNING,MULTICAST,IPv4,NOFAILOVER,STANDBY,INACTIVE> > mtu 0 index 3 > inet 0.0.0.0 netmask 0 > groupname primary > ether 0:14:4f:9e:cf:ff > > > > Working machine’s ifconfig output > $ ifconfig -a > lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 > index 1 > inet 127.0.0.1 netmask ff000000 > e1000g0: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu > 1500 index 2 > inet 172.17.134.81 netmask ffffff00 broadcast 172.17.134.255 > groupname primary > ether 0:14:4f:83:31:ac > e1000g0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index > 2 > inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255 > e1000g1: > flags=69000842<BROADCAST,RUNNING,MULTICAST,IPv4,NOFAILOVER,STANDBY,INACTIVE> > mtu 0 index 3 > inet 0.0.0.0 netmask 0 > groupname primary > ether 0:14:4f:83:31:ad > > > > So other than the inet address of e1000g0 and the ethernet addresses, the > output is exactly the same. If I unplumb the interfaces e1000g0:1 and > e1000g1 on the broken machine, the Cfengine daemons operate again. > > > Has anyone run into this bug before, or can help suggest anything? > > Thanks! > Mike > > > > _______________________________________________ > Help-cfengine mailing list > Help-cfengine@cfengine.org > https://cfengine.org/mailman/listinfo/help-cfengine > > _______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine