Forum: Cfengine Help
Subject: Is locking broken, or do I misunderstand?
Author: sauer
Link to topic: https://cfengine.com/forum/read.php?3,22374,22374#msg-22374
So, I have a promise to edit a bunch of files. Around 30K files. And it takes
about 20-30 minutes to verify. What I wanted to do was set that to run in the
background so other more time-critical promises would continue to be evaluated
in a timely fashion. Ignoring that I think there is probably some room to
improve performance of file editing promises here, I'd like to know why my code
isn't working. What happens is that one cf-agent starts on the bundle that
containes the file editing promises, but then 5 minutes later another one
starts up, and then the "we really screwed up bdb handling in 3.1.5" bugs start
coming out as there's contention for the cflock.db, Stuff starts going slower,
and before you know it (overnight) I've accumulated over 300 cf-agent processes
running, my load average is over 95 (that's 9500%, not 95%), and
/var/cfengine/outputs is getting a tad big. :)
So, while there's some bugs in the bdb code which should probably be fixed
(this did not happen as badly with 3.1.4, and doesn't happen at all with
3.0.4), I'm on the list to find out why the locks don't stop this method from
being evaluated more than once to begin with. With 3.0.4, even though the lock
problems didn't make a huge mess, I'd still get 3-4 cf-agents all working on
different files in the file edit promises. Here's what I have:
...
methods:
found_executables::
"fix_perl_paths"
action => measure_webmin_bg("240"), # check in background, 4 hours
usebundle => app_webmin_fix_executables( "$(executable_list)" );
...
body action measure_webmin_bg(delay) {
measurement_class => "Detect changes in $(this.promiser)";
ifelapsed => "$(delay)";
expireafter => "$(delay)";
background => "false";
}
So, the "main" webmin bundle, I have a methods section which promises to
validate the fix_perl_paths bundle if we found any perl scripts under the
webmin directory (the found_webmin class is set). The $(executable_list)
variable is an slist containing the names of about 10 other slists which get
joined together in the app_webmin_fix_executables bundle. I do that,
incidentally, because the module protocol has a maximum line length which
doesn't allow me to create a long enough list to have all of the files in a
single returned list, and I have to use the module protcol with a find to
generate the list of executable names because the line editing code has
introduced an arbitrary limitation which prevents me from just saying "edit all
of the *.cgi files below this directory" to begin with.
Anywho... It was my impression that adding this action to the fix_perl_paths
methods promise would delay its execution to once every 4 hours. And it seems
to sort of work; it delays new executions until 4 hoursafter the last one
finished. However, it was also my impression that there would be a lock placed
on evaluation of the methods promise, so that a subsequent invocation of
cf-agent would not also attempt to "help" evaluate that method. This does not
appear to be the case. What appears to be happening is that there is no lock
at all on that promise, and that the elapsed time isn't recorded until the
promise is finished verifying (that second part makes sense). The locks appear
to be placed on the individual file editing promises inside the fix_ bundle,
and when I end up having more than one cf-agent fairly rapidly trying to lock
and unlock each file edit, things just spiral out of control. In other words,
the methods: promise doesn't look like it gets locked; if mu
ltiple cf-agent processes start up, they all enter the bundle and start
editing files in parallel.
I had initially thought that the backgrounding was causing the problem (which
is why that's false now), but it's not. Though, assigning this action to the
file edit promises instead of the outer method promise and setting background
to true did result in some amusingly bad behavior. :)
Anyway, anyone have thoughts on how I can make cfengine only evaluate this
whole method in *one* cf-agent process? Perhaps set a persistent class and
skip the method if the class is still set, essentially implementing another set
of locks? It does seem possible that, since the list of files is split up into
several blocks, that what's really getting locked is the evaluation of the
bundle with a given element of the slist. Is that how the locking works? So
maybe I need to pass the list in (or the name of the list), rather than
iterating over the list?
I really want to just fork this and let it run in the background without
impacting the things that I actually do need to validate every 5 minutes. Or,
I suppose I can replace it with a "find | xargs sed", but darn it, I want to
have cfengine's reporting and promise validation, which is why I'm using the
product in the first place. :)
_______________________________________________
Help-cfengine mailing list
[email protected]
https://cfengine.org/mailman/listinfo/help-cfengine