On 9 nov 2010, at 10:53, Seva Gluschenko wrote: > Well, Cfengine reports in deed that a file was corrupted in transfer. > It doesn't report about replacing the old file, though, i.e. don't > treat the message that seriously. >
? It could not login into the system due a corrupted file. > 2010/11/9 Bas van der Vlies <b...@sara.nl>: >> Maybe i am misreading the info. But the mail is about a file gets corrupted >> during a copy. to my knowledge this should not happen when >> >> On 9 nov 2010, at 10:22, Seva Gluschenko wrote: >> >>> Of course, Cfengine3 acts the same way, file never gets installed >>> directly in place of an older file. >>> >> >> Maybe i am misreading the info. But the mail is about a file gets corrupted >> during a copy. to my knowledge this should not happen when this mechanism >> is used. >> >>> 2010/11/9 Bas van der Vlies <b...@sara.nl>: >>>> >>>> On 9 nov 2010, at 08:37, Seva Gluschenko wrote: >>>> >>>>> Frans, >>>>> >>>>> since you're terminating cf-serverd in the middle of a file transfer, >>>>> the receiving agent reasonably treats it as a corruption. There's >>>>> nothing wrong with it. On the other hand, why terminating cf-serverd >>>>> when you just need to restart cf-execd? Modify your promise and feel >>>>> safe. >>>>> >>>> >>>> I thought cfengine has some logic for transfering files: (i think this >>>> cfengine2 style, did not check it for cfengine3) >>>> * first copy it to <filename>.cfnew >>>> * if this succeed and it is correct move to <filename> >>>> >>>> This is to avoid corruption like this. I server can crash and you don't >>>> want the clients to sufffer from this with file that are corrupted. >>>> >>>> >>>>> 2010/11/8 Frans Lawaetz <fr...@broadinstitute.org>: >>>>>> Hi- >>>>>> >>>>>> I recently implemented a "service cfengine3 restart" weekly cron job as a >>>>>> workaround to the MAX_FD bug that others and myself have seen. I >>>>>> neglected >>>>>> to except the master from the restart so when cf-serverd was killed a >>>>>> number >>>>>> of hosts complained about in-flight transfers or not being able to reach >>>>>> the >>>>>> master. This is quite reasonable however I found one host that suffered >>>>>> a >>>>>> complete loss or corruption of its limits.conf file. It essentially >>>>>> bricked >>>>>> the system, requiring a rebuild. >>>>>> >>>>>> Here is the sequence: >>>>>> >>>>>> cron job restarts cf3. cf3 reports to syslog: >>>>>> >>>>>> Nov 7 04:23:06 cfengine3 cf-serverd[14585]: Received signal 15 >>>>>> (SIGTERM) >>>>>> while doing [lock.independent.server_cfengine.-cfengine3.the_server_d >>>>>> aemon_2542_MD5=5b2c904169606aa9b27ec369fd13e016] >>>>>> Nov 7 04:23:06 cfengine3 cf-serverd[14585]: Received signal 15 >>>>>> (SIGTERM) >>>>>> while doing >>>>>> [lock.independent.server_cfengine.-cfengine3.the_server_daemon_2542_MD5=5b2c904169606aa9b27ec369fd13e016] >>>>>> Nov 7 04:23:06 cfengine3 cf-serverd[14585]: Logical start time Fri Oct >>>>>> 29 >>>>>> 04:41:01 2010 >>>>>> Nov 7 04:23:06 cfengine3 cf-serverd[14585]: This sub-task started >>>>>> really >>>>>> at Thu Oct 28 12:28:34 2010 >>>>>> Nov 7 04:23:06 cfengine3 cf-serverd[14585]: Logical start time Thu Oct >>>>>> 28 >>>>>> 12:28:34 2010 >>>>>> Nov 7 04:23:06 cfengine3 cf-serverd[14585]: This sub-task started >>>>>> really >>>>>> at Thu Oct 28 12:28:34 2010 >>>>>> Nov 7 04:23:06 cfengine3 cf-serverd[14585]: Received signal 15 >>>>>> (SIGTERM) >>>>>> while doing [lock.independent.server_cfengine.-cfengine3.the_server_d >>>>>> aemon_2542_MD5=5b2c904169606aa9b27ec369fd13e016]Nov 7 04:23:06 cfengine3 >>>>>> cf-serverd[14585]: Logical start time Fri Oct 29 04:41:01 2010 >>>>>> Nov 7 04:23:06 cfengine3 cf-serverd[14585]: This sub-task started >>>>>> really >>>>>> at Thu Oct 28 12:28:34 2010 >>>>>> >>>>>> >>>>>> cf3 on the client host emailed me at approximately the time of the >>>>>> restart >>>>>> that it failed to copy limits.conf >>>>>> >>>>>> >>>>>> date: Sun, Nov 7, 2010 at 4:23 AM >>>>>> subject: community [hap10.broadinstitute.org/192.168.32.34] >>>>>> >>>>>> Was not able to copy /cfengine/farm/etc/security/limits.conf.crdwga to >>>>>> /etc/security/limits.conf >>>>>> I: Made in version 'not specified' of '/var/cfengine/inputs/farm.cf' near >>>>>> line 279 >>>>>> >>>>>> >>>>>> I have noticed other similar such failures on other hosts before but cf3 >>>>>> usually makes a note that it aborted the transaction: >>>>>> >>>>>> !! New file /etc/security/limits.conf.cfnew seems to have been corrupted >>>>>> in >>>>>> transit (dest 0 and src 1844), aborting! >>>>>> Was not able to copy /cfengine/farm/etc/security/limits.conf to >>>>>> /etc/security/limits.conf >>>>>> >>>>>> Immediately after the failure on the host in question it started >>>>>> reporting >>>>>> over the network that limits.conf was corrupt. >>>>>> >>>>>> Nov 7 04:23:02 hap10 crond[13650]: pam_limits(crond:session): cannot >>>>>> read >>>>>> settings from /etc/security/limits.conf: No such file or directory >>>>>> Nov 7 04:23:02 hap10 crond[13650]: pam_limits(crond:session): error >>>>>> parsing >>>>>> the configuration file: '/etc/security/limits.conf' >>>>>> >>>>>> I was of course unable to login to the system to investigate further so >>>>>> rebuilt it. >>>>>> >>>>>> I've since excepted the master from the weekly restart but I am alarmed >>>>>> that >>>>>> there is a use case where cf-agent can corrupt a file. Any ideas on how >>>>>> this might have happened and whether there are any added safeguards that >>>>>> can >>>>>> be put in place? >>>>>> >>>>>> The client is running cfengine3-community 3.0.5 and the master is running >>>>>> 3.1.0b2. Both are on CentOS5.5 x86_64. >>>>>> >>>>>> Thanks, >>>>>> Frans >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Help-cfengine mailing list >>>>>> Help-cfengine@cfengine.org >>>>>> https://cfengine.org/mailman/listinfo/help-cfengine >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> SY, Seva Gluschenko. >>>>> _______________________________________________ >>>>> Help-cfengine mailing list >>>>> Help-cfengine@cfengine.org >>>>> https://cfengine.org/mailman/listinfo/help-cfengine >>>> >>>> -- >>>> Bas van der Vlies >>>> b...@sara.nl >>>> >>>> >>>> >>>> >>> >>> >>> >>> -- >>> SY, Seva Gluschenko. >> >> -- >> Bas van der Vlies >> b...@sara.nl >> >> >> >> > > > > -- > SY, Seva Gluschenko. -- Bas van der Vlies b...@sara.nl _______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine