Forum: Cfengine Help
Subject: Is locking broken, or do I misunderstand?
Author: sauer
Link to topic: https://cfengine.com/forum/read.php?3,22374,22374#msg-22374

So, I have a promise to edit a bunch of files.  Around 30K files. And it takes 
about 20-30 minutes to verify.  What I wanted to do was set that to run in the 
background so other more time-critical promises would continue to be evaluated 
in a timely fashion.  Ignoring that I think there is probably some room to 
improve performance of file editing promises here, I'd like to know why my code 
isn't working.  What happens is that one cf-agent starts on the bundle that 
containes the file editing promises, but then 5 minutes later another one 
starts up, and then the "we really screwed up bdb handling in 3.1.5" bugs start 
coming out as there's contention for the cflock.db, Stuff starts going slower, 
and before you know it (overnight) I've accumulated over 300 cf-agent processes 
running, my load average is over 95 (that's 9500%, not 95%), and 
/var/cfengine/outputs is getting a tad big. :)

So, while there's some bugs in the bdb code which should probably be fixed 
(this did not happen as badly with 3.1.4, and doesn't happen at all with 
3.0.4), I'm on the list to find out why the locks don't stop this method from 
being evaluated more than once to begin with.  With 3.0.4, even though the lock 
problems didn't make a huge mess, I'd still get 3-4 cf-agents all working on 
different files in the file edit promises.  Here's what I have:


...
  methods:
    found_executables::
    "fix_perl_paths"
      action    => measure_webmin_bg("240"), # check in background, 4 hours
      usebundle => app_webmin_fix_executables( "$(executable_list)" );
...

body action measure_webmin_bg(delay) {
  measurement_class => "Detect changes in $(this.promiser)";
  ifelapsed   => "$(delay)";
  expireafter => "$(delay)";
  background  => "false";
}


So, the "main" webmin bundle, I have a methods section which promises to 
validate the fix_perl_paths bundle if we found any perl scripts under the 
webmin directory (the found_webmin class is set).  The $(executable_list) 
variable is an slist containing the names of about 10 other slists which get 
joined together in the app_webmin_fix_executables bundle.  I do that, 
incidentally, because the module protocol has a maximum line length which 
doesn't allow me to create a long enough list to have all of the files in a 
single returned list, and I have to use the module protcol with a find to 
generate the list of executable names because the line editing code has 
introduced an arbitrary limitation which prevents me from just saying "edit all 
of the *.cgi files below this directory" to begin with.

Anywho...  It was my impression that adding this action to the fix_perl_paths 
methods promise would delay its execution to once every 4 hours.  And it seems 
to sort of work; it delays new executions until 4 hoursafter the last one 
finished.  However, it was also my impression that there would be a lock placed 
on evaluation of the methods promise, so that a subsequent invocation of 
cf-agent would not also attempt to "help" evaluate that method.  This does not 
appear to be the case.  What appears to be happening is that there is no lock 
at all on that promise, and that the elapsed time isn't recorded until the 
promise is finished verifying (that second part makes sense).  The locks appear 
to be placed on the individual file editing promises inside the fix_ bundle, 
and when I end up having more than one cf-agent fairly rapidly trying to lock 
and unlock each file edit, things just spiral out of control.  In other words, 
the methods: promise doesn't look like it gets locked; if mu
 ltiple cf-agent processes start up, they all enter the bundle and start 
editing files in parallel.

I had initially thought that the backgrounding was causing the problem (which 
is why that's false now), but it's not.  Though, assigning this action to the 
file edit promises instead of the outer method promise and setting background 
to true did result in some amusingly bad behavior. :)

Anyway, anyone have thoughts on how I can make cfengine only evaluate this 
whole method in *one* cf-agent process?  Perhaps set a persistent class and 
skip the method if the class is still set, essentially implementing another set 
of locks?  It does seem possible that, since the list of files is split up into 
several blocks, that what's really getting locked is the evaluation of the 
bundle with a given element of the slist.  Is that how the locking works?  So 
maybe I need to pass the list in (or the name of the list), rather than 
iterating over the list?

I really want to just fork this and let it run in the background without 
impacting the things that I actually do need to validate every 5 minutes.  Or, 
I suppose I can replace it with a "find | xargs sed", but darn it, I want to 
have cfengine's reporting and promise validation, which is why I'm using the 
product in the first place. :)

_______________________________________________
Help-cfengine mailing list
Help-cfengine@cfengine.org
https://cfengine.org/mailman/listinfo/help-cfengine

Reply via email to