Apologies for email without enough coffee -- Please s/cf-execd/cf-serverd/g in my comments. I'd be mostly curious if this is easily reproducible, or simply an edge case that needs identified.
On 11/9/10 9:16 AM, "Mike Hoskins" <micho...@cisco.com> wrote: > If he'd kill -9'd cf-execd, I'd expect corruption. Since the output he > pasted looked like it was a signal 15, I would have expected it to be caught > and cleaned up after (e.g. Finish in-progress transfers). Further, the cf2 > behavior of copying to a temp file and then moving into place does still > work from his output as well...it showed the copy to a temporary location. > This seems to be all the more reason to ask -- why was there any corruption? > > Sure the promise could be modified, or the schedule staggered to avoid the > issue...but those are workarounds vs. real solutions it seems. > > Question: Can the condition be reliably reproduced by just kill -15'ing > cf-execd with in-progress file transfers? > > If yes, then that would seem bad to me and get reported as a bug. The kill > -9 (hard stop) vs. kill -15 (graceful shutdown) thing is not cfengine > specific, it's how many UNIX programs work. If signal 15's can cause > corruption, it may very well catch people off guard. > > On 11/9/10 1:22 AM, "Seva Gluschenko" <seva.glusche...@gmail.com> wrote: > >> Of course, Cfengine3 acts the same way, file never gets installed >> directly in place of an older file. >> >> 2010/11/9 Bas van der Vlies <b...@sara.nl>: >>> >>> On 9 nov 2010, at 08:37, Seva Gluschenko wrote: >>> >>>> Frans, >>>> >>>> since you're terminating cf-serverd in the middle of a file transfer, >>>> the receiving agent reasonably treats it as a corruption. There's >>>> nothing wrong with it. On the other hand, why terminating cf-serverd >>>> when you just need to restart cf-execd? Modify your promise and feel >>>> safe. >>>> >>> >>> I thought cfengine has some logic for transfering files: (i think this >>> cfengine2 style, did not check it for cfengine3) >>> * first copy it to <filename>.cfnew >>> * if this succeed and it is correct move to <filename> >>> >>> This is to avoid corruption like this. I server can crash and you don't want >>> the clients to sufffer from this with file that are corrupted. >>> >>> >>>> 2010/11/8 Frans Lawaetz <fr...@broadinstitute.org>: >>>>> Hi- >>>>> >>>>> I recently implemented a "service cfengine3 restart" weekly cron job as a >>>>> workaround to the MAX_FD bug that others and myself have seen. I >>>>> neglected >>>>> to except the master from the restart so when cf-serverd was killed a >>>>> number >>>>> of hosts complained about in-flight transfers or not being able to reach >>>>> the >>>>> master. This is quite reasonable however I found one host that suffered a >>>>> complete loss or corruption of its limits.conf file. It essentially >>>>> bricked >>>>> the system, requiring a rebuild. >>>>> >>>>> Here is the sequence: >>>>> >>>>> cron job restarts cf3. cf3 reports to syslog: >>>>> >>>>> Nov 7 04:23:06 cfengine3 cf-serverd[14585]: Received signal 15 (SIGTERM) >>>>> while doing [lock.independent.server_cfengine.-cfengine3.the_server_d >>>>> aemon_2542_MD5=5b2c904169606aa9b27ec369fd13e016] >>>>> Nov 7 04:23:06 cfengine3 cf-serverd[14585]: Received signal 15 (SIGTERM) >>>>> while doing >>>>> [lock.independent.server_cfengine.-cfengine3.the_server_daemon_2542_MD5=5b>>>>> 2 >>>>> c904169606aa9b27ec369fd13e016] >>>>> Nov 7 04:23:06 cfengine3 cf-serverd[14585]: Logical start time Fri Oct >>>>> 29 >>>>> 04:41:01 2010 >>>>> Nov 7 04:23:06 cfengine3 cf-serverd[14585]: This sub-task started really >>>>> at Thu Oct 28 12:28:34 2010 >>>>> Nov 7 04:23:06 cfengine3 cf-serverd[14585]: Logical start time Thu Oct >>>>> 28 >>>>> 12:28:34 2010 >>>>> Nov 7 04:23:06 cfengine3 cf-serverd[14585]: This sub-task started really >>>>> at Thu Oct 28 12:28:34 2010 >>>>> Nov 7 04:23:06 cfengine3 cf-serverd[14585]: Received signal 15 (SIGTERM) >>>>> while doing [lock.independent.server_cfengine.-cfengine3.the_server_d >>>>> aemon_2542_MD5=5b2c904169606aa9b27ec369fd13e016]Nov 7 04:23:06 cfengine3 >>>>> cf-serverd[14585]: Logical start time Fri Oct 29 04:41:01 2010 >>>>> Nov 7 04:23:06 cfengine3 cf-serverd[14585]: This sub-task started really >>>>> at Thu Oct 28 12:28:34 2010 >>>>> >>>>> >>>>> cf3 on the client host emailed me at approximately the time of the restart >>>>> that it failed to copy limits.conf >>>>> >>>>> >>>>> date: Sun, Nov 7, 2010 at 4:23 AM >>>>> subject: community [hap10.broadinstitute.org/192.168.32.34] >>>>> >>>>> Was not able to copy /cfengine/farm/etc/security/limits.conf.crdwga to >>>>> /etc/security/limits.conf >>>>> I: Made in version 'not specified' of '/var/cfengine/inputs/farm.cf' near >>>>> line 279 >>>>> >>>>> >>>>> I have noticed other similar such failures on other hosts before but cf3 >>>>> usually makes a note that it aborted the transaction: >>>>> >>>>> !! New file /etc/security/limits.conf.cfnew seems to have been corrupted >>>>> in >>>>> transit (dest 0 and src 1844), aborting! >>>>> Was not able to copy /cfengine/farm/etc/security/limits.conf to >>>>> /etc/security/limits.conf >>>>> >>>>> Immediately after the failure on the host in question it started reporting >>>>> over the network that limits.conf was corrupt. >>>>> >>>>> Nov 7 04:23:02 hap10 crond[13650]: pam_limits(crond:session): cannot read >>>>> settings from /etc/security/limits.conf: No such file or directory >>>>> Nov 7 04:23:02 hap10 crond[13650]: pam_limits(crond:session): error >>>>> parsing >>>>> the configuration file: '/etc/security/limits.conf' >>>>> >>>>> I was of course unable to login to the system to investigate further so >>>>> rebuilt it. >>>>> >>>>> I've since excepted the master from the weekly restart but I am alarmed >>>>> that >>>>> there is a use case where cf-agent can corrupt a file. Any ideas on how >>>>> this might have happened and whether there are any added safeguards that >>>>> can >>>>> be put in place? >>>>> >>>>> The client is running cfengine3-community 3.0.5 and the master is running >>>>> 3.1.0b2. Both are on CentOS5.5 x86_64. >>>>> >>>>> Thanks, >>>>> Frans _______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine