Dear Wes,

this is disturbing, I've not heart of anything like this so difficult to 
know what might be happening here. Could you let us know the exact 
policy you are using on the client and server sides please? Are you 
using encryption, and is the encryption library the same on each end? Do 
I understand that you can't reproduce the issue any more?

We work pretty hard to make sure that CFEngine is absolutely safe to run 
and upgrade, so any info is appreciated.

Mark

On 07/05/2012 06:29 PM, Wes Hardin wrote:
> In the past week, I've had two separate occurrences of, what I believe to be, 
> two pretty serious issues.  Another admin cleared the issue the first time 
> they happened before capturing any debugging (other than emails from Cfengine 
> and cron) but this second time I've kept the machine in the broken state for 
> debugging purposes.
>
> In both cases the central policy host is running 3.3.2 and the remote node is 
> running 3.2.0.  Both are self-compiled to allow for a custom WORKDIR.
>
> First, a policy file was only partially transferred and the partial copy 
> overwrote the existing file.  Obviously, this causes validation of my policy 
> to then fail, which is the root of the second issue.  I'll get to that in 
> just a moment.  Cfengine is usually pretty good about   This is what was 
> captured in the outputs directory by cf-execd:
>
> # cat ../outputs/previous
> Failed send
>   !!! System reports error for recv: "Resource temporarily unavailable"
> I: Made in version 'not specified' of 
> '/var/cache/cfengine/inputs/promises.cf' near line 118
> I: Comment: Update local policy cache from master policy server
>   !! Transmission refused or failed statting 
> /etc/cfengine/masterfiles/global/any/var/cache/cfengine/inputs/eod.cf
> Got: CFD_FALSE
>   !! Transmission refused or failed statting 
> /etc/cfengine/masterfiles/global/any/var/cache/cfengine/inputs/dns.cf
> Got:
>   !! Transmission refused or failed statting 
> /etc/cfengine/masterfiles/global/any/var/cache/cfengine/inputs/cobbler.cf
> Got:
>   !! Transmission refused or failed statting 
> /etc/cfengine/masterfiles/global/any/var/cache/cfengine/inputs/ldap.cf
> Got:
>   !! Transmission refused or failed statting 
> /etc/cfengine/masterfiles/global/any/var/cache/cfengine/inputs/cfengine_stdlib.cf
> Got:
>   !! Transmission refused or failed statting 
> /etc/cfengine/masterfiles/global/any/var/cache/cfengine/inputs/cfupgrade.cf
> Got:
>   !! Transmission refused or failed statting 
> /etc/cfengine/masterfiles/global/any/var/cache/cfengine/modules
> Got:
> #
>
> The actual file that was corrupted is not listed above but is found by 
> cf-promises.
>
> # cf-promises
> cf3> /var/cache/cfengine/inputs/globals.cf:1,2: Something defined outside of 
> a block or missing punctuation in input, near token 't'
> cf3> /var/cache/cfengine/inputs/globals.cf:1,2: syntax error, near token 't'
> #
>
> My cached globals.cf got truncated about 1/3 of way through the file.  There 
> is plenty of disk space, the cache is on local disk, no indications of 
> hardware failure, and no other applications appear to be affected.
>
> Normally, such a policy validation failure would be fixed by cf-agent falling 
> back to the failsafe.cf (assuming that file is also not corrupt).  But for 
> whatever reason, cf-agent did not execute the failsafe.  This is my second 
> serious issue.  Cf-agent does not seqfault or otherwise crash, it reports no 
> errors other than the invalid inputs.  Normally on errors like this, one 
> would see some lines like this:
>
> ##
> Fatal cfengine error: Too many errors
> cf-agent was not able to get confirmation of promises from cf-promises, so 
> going to failsafe
> ##
>
> but I only got the first line and then a normal exit.
>
> Debug mode didn't really offer me much more to go on either.
>
>
> ##
> Fatal cfengine error: Too many errors
>
> GetVariable(control_agent,track_value) type=(to be determined)
> IsExpandable(track_value) - syntax verify
> Found 0 variables in (track_value)
> Looking for control_agent.track_value
> Searching for scope context control_agent
> Found scope reference control_agent
> GetVariable(control_agent,track_value): using scope 'control_agent' for 
> variable 'track_value'
> No such variable found control_agent.track_value
>
>
> GetVariable(control_common,version) type=(to be determined)
> IsExpandable(version) - syntax verify
> Found 0 variables in (version)
> Looking for control_common.version
> Searching for scope context control_common
> Found scope reference control_common
> GetVariable(control_common,version): using scope 'control_common' for 
> variable 'version'
> No such variable found control_common.version
>
> Outcome of version (not specified): No checks were scheduled
> GenericDeInitialize()
> CloseAllDB()
> Closed 0 open DB handles
> ##
>
> At this point, whatever troubleshooting I've done has managed to twiddle 
> enough bits that the failsafe is kicking in now, so I'm unable to capture any 
> more debug data.

-- 

CTO and Founder
CFEngine

http://www.cfengine.com
http://www.markburgess.org
Twitter: @markburgess_osl, @CFEngine_news



_______________________________________________
Help-cfengine mailing list
Help-cfengine@cfengine.org
https://cfengine.org/mailman/listinfo/help-cfengine

Reply via email to