On 6/7/11 10:00 AM, "Mikhail Gusarov" <mikhail.gusa...@cfengine.com> wrote: > On 06/07/2011 06:56 PM, Michael Stevens wrote: >> I'm intermittently getting these errors on some of my hosts ... I'm only >> getting them from automated runs, have been unsuccessful trying to provoke >> the errors running cf-agent manually. Any idea what causes this and how I fix >> it? All the machines are getting the same HOFFSET erros, but with different >> numbers; >> >> Page 5: bad HOFFSET 1552, appears to be 1596 >> Page 5: item order check unsafe: skipping >> /var/cfengine/state/cf_lock.db: DB_VERIFY_BAD: Database verification failed >> BDB_VerifyDB: database /var/cfengine/state/cf_lock.db is corrupted: >> DB_VERIFY_BAD: Database verification failed >> >> /var/cfengine/state/cf_lock.db: unable to flush: No such file or directory >> BDB_CloseDB: Unable to close database: No such file or directory >> CloseDB: Could not close DB handle. >> CloseDB: Trying to remove handle from open pool anyway. >> /var/cfengine/state/cf_lock.db: No such file or directory >> >> The "corrupted" lockfile has a timestamp matching the time of the errors >> above as seen in the system log; >> # ls -l /var/cfengine/state/cf_lock.db* >> -rw------- 1 root root 32768 Jun 7 09:47 /var/cfengine/state/cf_lock.db >> -rw------- 1 root root 32768 Jun 7 07:47 >> /var/cfengine/state/cf_lock.db.corrupted > > This has been an attempt to detect corrupted databases and set them > aside in 3.1.5. Turned out it created more problems that solved > (recovery process might get in the way and become a cause of database > corruption due to 1) its length 2) lack of locks guarding verification) > and it has been reverted in trunk. You could try to use trunk or get a > patch which removes BDB_VerifyDB from dbm_berkeley.c
Alas, it seems part of life with bdb -- it was an issue at times in 2.x as well (and I have a few ugly hacks to work around it since thousands of hosts mean "at times" is too often to manually address). I'm sad to hear an official fix has been backed out, though avoiding making the problem worse sounds sane. I must admit it looks a lot like my hack. :-) Having said that, will more research be done in this area and a robust "self healing" mechanism eventually be implemented in the future? Checksums are an essential feature for us, and one of the reasons we selected cfengine. It wasn't a primary factor in our decision, but "getting that stuff for free" really made InfoSec happy. I would love to know this stuff is getting adequate attention in cf3. _______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine