Forum: Cfengine Help Subject: Is locking broken, or do I misunderstand? Author: sauer Link to topic: https://cfengine.com/forum/read.php?3,22374,22374#msg-22374
So, I have a promise to edit a bunch of files. Around 30K files. And it takes about 20-30 minutes to verify. What I wanted to do was set that to run in the background so other more time-critical promises would continue to be evaluated in a timely fashion. Ignoring that I think there is probably some room to improve performance of file editing promises here, I'd like to know why my code isn't working. What happens is that one cf-agent starts on the bundle that containes the file editing promises, but then 5 minutes later another one starts up, and then the "we really screwed up bdb handling in 3.1.5" bugs start coming out as there's contention for the cflock.db, Stuff starts going slower, and before you know it (overnight) I've accumulated over 300 cf-agent processes running, my load average is over 95 (that's 9500%, not 95%), and /var/cfengine/outputs is getting a tad big. :) So, while there's some bugs in the bdb code which should probably be fixed (this did not happen as badly with 3.1.4, and doesn't happen at all with 3.0.4), I'm on the list to find out why the locks don't stop this method from being evaluated more than once to begin with. With 3.0.4, even though the lock problems didn't make a huge mess, I'd still get 3-4 cf-agents all working on different files in the file edit promises. Here's what I have: ... methods: found_executables:: "fix_perl_paths" action => measure_webmin_bg("240"), # check in background, 4 hours usebundle => app_webmin_fix_executables( "$(executable_list)" ); ... body action measure_webmin_bg(delay) { measurement_class => "Detect changes in $(this.promiser)"; ifelapsed => "$(delay)"; expireafter => "$(delay)"; background => "false"; } So, the "main" webmin bundle, I have a methods section which promises to validate the fix_perl_paths bundle if we found any perl scripts under the webmin directory (the found_webmin class is set). The $(executable_list) variable is an slist containing the names of about 10 other slists which get joined together in the app_webmin_fix_executables bundle. I do that, incidentally, because the module protocol has a maximum line length which doesn't allow me to create a long enough list to have all of the files in a single returned list, and I have to use the module protcol with a find to generate the list of executable names because the line editing code has introduced an arbitrary limitation which prevents me from just saying "edit all of the *.cgi files below this directory" to begin with. Anywho... It was my impression that adding this action to the fix_perl_paths methods promise would delay its execution to once every 4 hours. And it seems to sort of work; it delays new executions until 4 hoursafter the last one finished. However, it was also my impression that there would be a lock placed on evaluation of the methods promise, so that a subsequent invocation of cf-agent would not also attempt to "help" evaluate that method. This does not appear to be the case. What appears to be happening is that there is no lock at all on that promise, and that the elapsed time isn't recorded until the promise is finished verifying (that second part makes sense). The locks appear to be placed on the individual file editing promises inside the fix_ bundle, and when I end up having more than one cf-agent fairly rapidly trying to lock and unlock each file edit, things just spiral out of control. In other words, the methods: promise doesn't look like it gets locked; if mu ltiple cf-agent processes start up, they all enter the bundle and start editing files in parallel. I had initially thought that the backgrounding was causing the problem (which is why that's false now), but it's not. Though, assigning this action to the file edit promises instead of the outer method promise and setting background to true did result in some amusingly bad behavior. :) Anyway, anyone have thoughts on how I can make cfengine only evaluate this whole method in *one* cf-agent process? Perhaps set a persistent class and skip the method if the class is still set, essentially implementing another set of locks? It does seem possible that, since the list of files is split up into several blocks, that what's really getting locked is the evaluation of the bundle with a given element of the slist. Is that how the locking works? So maybe I need to pass the list in (or the name of the list), rather than iterating over the list? I really want to just fork this and let it run in the background without impacting the things that I actually do need to validate every 5 minutes. Or, I suppose I can replace it with a "find | xargs sed", but darn it, I want to have cfengine's reporting and promise validation, which is why I'm using the product in the first place. :) _______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine