Of course, Cfengine3 acts the same way, file never gets installed
directly in place of an older file.

2010/11/9 Bas van der Vlies <b...@sara.nl>:
>
> On 9 nov 2010, at 08:37, Seva Gluschenko wrote:
>
>> Frans,
>>
>> since you're terminating cf-serverd in the middle of a file transfer,
>> the receiving agent reasonably treats it as a corruption. There's
>> nothing wrong with it. On the other hand, why terminating cf-serverd
>> when you just need to restart cf-execd? Modify your promise and feel
>> safe.
>>
>
> I thought cfengine has some logic for transfering files: (i think this 
> cfengine2 style, did not check it for cfengine3)
>  * first copy it to <filename>.cfnew
>  * if this succeed and it is correct move to <filename>
>
> This is to avoid corruption like this. I server can crash and you don't want 
> the clients to sufffer from this with file that are corrupted.
>
>
>> 2010/11/8 Frans Lawaetz <fr...@broadinstitute.org>:
>>> Hi-
>>>
>>> I recently implemented a "service cfengine3 restart" weekly cron job as a
>>> workaround to the MAX_FD bug that others and myself have seen.  I neglected
>>> to except the master from the restart so when cf-serverd was killed a number
>>> of hosts complained about in-flight transfers or not being able to reach the
>>> master.  This is quite reasonable however I found one host that suffered a
>>> complete loss or corruption of its limits.conf file.  It essentially bricked
>>> the system, requiring a rebuild.
>>>
>>> Here is the sequence:
>>>
>>> cron job restarts cf3.  cf3 reports to syslog:
>>>
>>> Nov  7 04:23:06 cfengine3 cf-serverd[14585]:  Received signal 15 (SIGTERM)
>>> while doing [lock.independent.server_cfengine.-cfengine3.the_server_d
>>> aemon_2542_MD5=5b2c904169606aa9b27ec369fd13e016]
>>> Nov  7 04:23:06 cfengine3 cf-serverd[14585]:  Received signal 15 (SIGTERM)
>>> while doing
>>> [lock.independent.server_cfengine.-cfengine3.the_server_daemon_2542_MD5=5b2c904169606aa9b27ec369fd13e016]
>>> Nov  7 04:23:06 cfengine3 cf-serverd[14585]:  Logical start time Fri Oct 29
>>> 04:41:01 2010
>>> Nov  7 04:23:06 cfengine3 cf-serverd[14585]:  This sub-task started really
>>> at Thu Oct 28 12:28:34 2010
>>> Nov  7 04:23:06 cfengine3 cf-serverd[14585]:  Logical start time Thu Oct 28
>>> 12:28:34 2010
>>> Nov  7 04:23:06 cfengine3 cf-serverd[14585]:  This sub-task started really
>>> at Thu Oct 28 12:28:34 2010
>>> Nov  7 04:23:06 cfengine3 cf-serverd[14585]:  Received signal 15 (SIGTERM)
>>> while doing [lock.independent.server_cfengine.-cfengine3.the_server_d
>>> aemon_2542_MD5=5b2c904169606aa9b27ec369fd13e016]Nov  7 04:23:06 cfengine3
>>> cf-serverd[14585]:  Logical start time Fri Oct 29 04:41:01 2010
>>> Nov  7 04:23:06 cfengine3 cf-serverd[14585]:  This sub-task started really
>>> at Thu Oct 28 12:28:34 2010
>>>
>>>
>>> cf3 on the client host emailed me at approximately the time of the restart
>>> that it failed to copy limits.conf
>>>
>>>
>>> date: Sun, Nov 7, 2010 at 4:23 AM
>>> subject: community [hap10.broadinstitute.org/192.168.32.34]
>>>
>>> Was not able to copy /cfengine/farm/etc/security/limits.conf.crdwga to
>>> /etc/security/limits.conf
>>> I: Made in version 'not specified' of '/var/cfengine/inputs/farm.cf' near
>>> line 279
>>>
>>>
>>> I have noticed other similar such failures on other hosts before but cf3
>>> usually makes a note that it aborted the transaction:
>>>
>>> !! New file /etc/security/limits.conf.cfnew seems to have been corrupted in
>>> transit (dest 0 and src 1844), aborting!
>>> Was not able to copy /cfengine/farm/etc/security/limits.conf to
>>> /etc/security/limits.conf
>>>
>>> Immediately after the failure on the host in question it started reporting
>>> over the network that limits.conf was corrupt.
>>>
>>> Nov  7 04:23:02 hap10 crond[13650]: pam_limits(crond:session): cannot read
>>> settings from /etc/security/limits.conf: No such file or directory
>>> Nov  7 04:23:02 hap10 crond[13650]: pam_limits(crond:session): error parsing
>>> the configuration file: '/etc/security/limits.conf'
>>>
>>> I was of course unable to login to the system to investigate further so
>>> rebuilt it.
>>>
>>> I've since excepted the master from the weekly restart but I am alarmed that
>>> there is a use case where cf-agent can corrupt a file.  Any ideas on how
>>> this might have happened and whether there are any added safeguards that can
>>> be put in place?
>>>
>>> The client is running cfengine3-community 3.0.5 and the master is running
>>> 3.1.0b2.  Both are on CentOS5.5 x86_64.
>>>
>>> Thanks,
>>> Frans
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Help-cfengine mailing list
>>> Help-cfengine@cfengine.org
>>> https://cfengine.org/mailman/listinfo/help-cfengine
>>>
>>>
>>
>>
>>
>> --
>> SY, Seva Gluschenko.
>> _______________________________________________
>> Help-cfengine mailing list
>> Help-cfengine@cfengine.org
>> https://cfengine.org/mailman/listinfo/help-cfengine
>
> --
> Bas van der Vlies
> b...@sara.nl
>
>
>
>



-- 
SY, Seva Gluschenko.
_______________________________________________
Help-cfengine mailing list
Help-cfengine@cfengine.org
https://cfengine.org/mailman/listinfo/help-cfengine

Reply via email to