Hi, Mike.  Is it workable to suggest upgrading to 3.1.0?

Yours,
-at


On Fri, Nov 19, 2010 at 4:33 PM, Mike Svoboda <msvob...@linkedin.com> wrote:
> I’ve deployed Cfengine 3.0.5p1 across 800 hosts.  I only have an issue with
> the Cfengine daemons on 1 box where it appears I am hitting a bug.  On this
> machine, it spins a single core to 100% user space CPU utilization.  Here
> are the details.
>
>
> $ /var/cfengine/bin/cf-agent -v
> ....
> ...
> f3 ------------------------------------------------------------------------
> cf3 # Extended system discovery is only available in version Nova and above
> cf3 Additional hard class defined as: 32_bit
> cf3 Additional hard class defined as: sunos_5_10
> cf3 Additional hard class defined as: sunos_i86pc
> cf3 Additional hard class defined as: sunos_i86pc_5_10
> cf3 Additional hard class defined as: i386
> cf3 Additional hard class defined as: i86pc
> cf3 GNU autoconf class from compile time: compiled_on_solaris2_10
> cf3 Address given by nameserver: 172.17.134.80
> cf3 Interface 1: lo0
> cf3 Interface 2: e1000g0
> cf3 Adding alias loghost..
> cf3  !! Cannot discover hardware IP, using DNS value
> ^C
>
>
> So at the “cannot discover hardware IP” point, it hangs and spins the CPU to
> 100%.  Looking at prstat –Lm output below:
>
>
> $ prstat -Lm
>    PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG
> PROCESS/LWPID
>  16398 root     100 0.0 0.0 0.0 0.0 0.0 0.0 0.3   0 190   0   0 cf-agent/1
>
>
> Putting cf-agent into super debug mode, I see this....
>
> Broken host:
> $ /var/cfengine/bin/cf-agent –ddd
> ....
> ....
> GetVariable(sys,ipv4_1[172_17_134_80]) type=(to be determined)
> IsExpandable(ipv4_1[172_17_134_80]) - syntax verify
> Found 0 variables in (ipv4_1[172_17_134_80])
> Looking for sys.ipv4_1[172_17_134_80]
> Searching for scope context sys
> Found scope reference sys
> GetVariable(sys,ipv4_1[172_17_134_80]): using scope 'sys' for variable
> 'ipv4_1[172_17_134_80]'
>
>
>
> At which point, cf-agent hangs.  Comparing this to a working host, this is
> what I see.
>
> Working host:
> GetVariable(sys,ipv4_1[172_17_134_81]) type=(to be determined)
> IsExpandable(ipv4_1[172_17_134_81]) - syntax verify
> Found 0 variables in (ipv4_1[172_17_134_81])
> Looking for sys.ipv4_1[172_17_134_81]
> Searching for scope context sys
> Found scope reference sys
> GetVariable(sys,ipv4_1[172_17_134_81]): using scope 'sys' for variable
> 'ipv4_1[172_17_134_81]'
> No such variable found sys.ipv4_1[172_17_134_81]
> AddVariableHash(sys.ipv4_1[172_17_134_81]=172 (string) rtype=s)
> Searching for scope context sys
> Found scope reference sys
> CopyRvalItem(s)
> ScanScalar([172])
> DeleteRvalItem(l)
> DeleteRval NULL
> DeleteRvalItem(l)
> DeleteRval NULL
> Added Variable ipv4_1[172_17_134_81] at hash address 60 in scope sys with
> value (omitted)
> Trying to locate my IPv6 address
> Unappending Trying to locate my IPv6 address
> Unix_cf_popen(/sbin/ifconfig -a)
> Unix_cf_pclose(pp)
> cf_pwait - Waiting for process 12411
> Looking for environment from cf-monitor...
> Unappending Looking for environment from cf-monitor...
> Searching for scope context mon
> Found scope reference mon
> No variable matched
> NewScalar(mon,env_time,Sat Nov 20 00:28:23 2010)
>
>
> So the broken host never gets to the “No such variable found
> sys.ipv4_1[172_17_134_80]” statement.
>
> So, I know this is a problem with Cfengine parsing the network interfaces.
>  The only thing, is I can not see a difference at all between the working
> and non-working machines.
>
>
> Broken machine’s ifconfig output:
> $ ifconfig -a
> lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232
> index 1
>         inet 127.0.0.1 netmask ff000000
> e1000g0: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu
> 1500 index 2
>         inet 172.17.134.80 netmask ffffff00 broadcast 172.17.134.255
>         groupname primary
>         ether 0:14:4f:9e:cf:fe
> e1000g0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index
> 2
>         inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
> e1000g1:
> flags=69000842<BROADCAST,RUNNING,MULTICAST,IPv4,NOFAILOVER,STANDBY,INACTIVE>
> mtu 0 index 3
>         inet 0.0.0.0 netmask 0
>         groupname primary
>         ether 0:14:4f:9e:cf:ff
>
>
>
> Working machine’s ifconfig output
> $ ifconfig -a
> lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232
> index 1
>         inet 127.0.0.1 netmask ff000000
> e1000g0: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu
> 1500 index 2
>         inet 172.17.134.81 netmask ffffff00 broadcast 172.17.134.255
>         groupname primary
>         ether 0:14:4f:83:31:ac
> e1000g0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index
> 2
>         inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
> e1000g1:
> flags=69000842<BROADCAST,RUNNING,MULTICAST,IPv4,NOFAILOVER,STANDBY,INACTIVE>
> mtu 0 index 3
>         inet 0.0.0.0 netmask 0
>         groupname primary
>         ether 0:14:4f:83:31:ad
>
>
>
> So other than the inet address of e1000g0 and the ethernet addresses, the
> output is exactly the same.  If I unplumb the interfaces e1000g0:1 and
> e1000g1 on the broken machine, the Cfengine daemons operate again.
>
>
> Has anyone run into this bug before, or can help suggest anything?
>
> Thanks!
> Mike
>
>
>
> _______________________________________________
> Help-cfengine mailing list
> Help-cfengine@cfengine.org
> https://cfengine.org/mailman/listinfo/help-cfengine
>
>
_______________________________________________
Help-cfengine mailing list
Help-cfengine@cfengine.org
https://cfengine.org/mailman/listinfo/help-cfengine

Reply via email to