Bugs item #857909, was opened at 2003-12-10 14:41 Message generated for change (Comment added) made by brandonh You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=857909&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: Python 2.3 Status: Closed Resolution: Wont Fix Priority: 5 Submitted By: Predrag Miocinovic (predragm) Assigned to: Gregory P. Smith (greg) Summary: bsddb craps out sporadically Initial Comment: I get following from Python2.3.2 with BerkeleyDB 3.3.11 running on linux RH7.3; ------------------------ Traceback (most recent call last): File "/raid/ANITA-lite/gse/unpackd.py", line 702, in ? PacketObject.shelve() File "/raid/ANITA-lite/gse/unpackd.py", line 78, in shelve wvShelf[shelfKey] = self File "/usr/local/lib/python2.3/shelve.py", line 130, in __setitem__ self.dict[key] = f.getvalue() File "/usr/local/lib/python2.3/bsddb/__init__.py", line 120, in __setitem__ self.db[key] = value bsddb._db.DBRunRecoveryError: (-30987, 'DB_RUNRECOVERY: Fatal error, run database recovery -- PANIC: Invalid argument') Exception bsddb._db.DBRunRecoveryError: (-30987, 'DB_RUNRECOVERY: Fatal error, run database recovery') in ignored Exception bsddb._db.DBRunRecoveryError: (-30987, 'DB_RUNRECOVERY: Fatal error, run database recovery') in ignored ---------------------------------- The server reporting this is running at relatively heavy load and the error occurs several times per day (this call occurs roughly 100,000 per day, but only 42 times per any given shelve instance). It reminds be of bug report #775414, but this is a non-threaded application. That said, another process is accessing the same shelve, but I've implemented a lockout system which should make sure they don't have simultaneous access. The lockout seems to work fine. The same application is running on different machine using Python2.3.2 with BerkeleyDB 4.0.14 on linux RH9 and the same error occured once (to my knowledge), but with "30987" replaced by "30981" in the traceback above, if it makes any difference. Finally, a third system, python2.3.2 with BerkeleyDB 4.0.14 on linux RH9 (but quite a bit faster, and thus lighter load) runs w/o reporting this problem so far. I don't have a convenient code snipet to exemplify the problem, but I don't do anything more than open (or re-open) a shelve and write a single python object instance to it per opening. If necessary, I can provide the code in question. ---------------------------------------------------------------------- Comment By: Brandon Hechinger (brandonh) Date: 2005-12-08 01:33 Message: Logged In: YES user_id=226421 We also get this error, though not using Python, but C. I'm not sure why people are so eager to dismiss it as an issue here, however, for it might be something your Python is doing with the Berkeley DB interface which could be improved. In our case, there is a similarity -- the site accesses the database(s) at relatively high frequency, and we use our own locking system to prevent any conflict (allowing multiple readers and exclusive writers -- writers not so much as generating a path to the database, let alone opening it, until they obtain the separate lock handled by our software). Periodically one of the databases will have an error when reading a key, and this error will remain until the database is repaired. The error return code is -30987. It's not 100% conclusive if it happens primarily on frequently accessed databases or not, and were it the case, it is not clear whether that's just because it occurs because of the high volume of access, or just because their volume increases the likelihood of encountering an error. In any case, our locking mechanisms (we've tried more than one) do lock prior to the database being opened at all, and are handled in a multi-reader single-writer way. Again, it's not clear if it's a Berkeley DB problem, or a problem with the *way* we are accessing/using Berkeley DB. Until this is known, I don't think it should be so quickly blown off that it's not a Python issue -- even if a bit of resources of the Python resources went into finding a Berkeley DB problem, would it result in such a bad world? :) ---------------------------------------------------------------------- Comment By: Gregory P. Smith (greg) Date: 2004-06-16 15:50 Message: Logged In: YES user_id=413 DB_RUNRECOVERY errors are a sleepycat BerkeleyDB internal error and don't have anything to do with the python library wrapper. For help in tracking them down I suggest using the latest BerkeleyDB version and ask with example code on the berkeleydb newsgroups or ask sleepycat themselves (they don't bite, they're friendly). closing this bug as its not a python or extension module bug. If you're looking for a multiprocess aware BerkeleyDB shelve support, that should be a feature request (ideally with an example implementation :). ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2003-12-22 16:04 Message: Logged In: YES user_id=250749 I can sympathise with your POV, but shelve has a documented restriction that it is not supported for multiuser user use without specific external support - that is multiple readers are Ok, but writing requires exclusive access to the shelve database. As you are using it in such an environment, it is up to you to guarantee the required safety. The error being reported is highly likely to be a consequence of your locking scheme being inadequate for use with the BerkeleyDB environment, at least on that system, and my suggestion that you take this up in a BerkeleyDB forum was directed at you getting sufficient information to improve your locking scheme to avoid the problem you see. I think you are a little optimistic expecting the shelve module (let alone the anydbm module) to cope with exceptions arising from use outside its documented restrictions - and BerkeleyDB supports lots of capability beyond the scope of the functionality used by shelve and anydbm and the exceptions to go with that. If you care about the shelve storage format, you can force the type of storage by creating an empty database of the format of your choice, provided that that format is supported by anydbm. With a bit of care, you should be able to convert a shelve from one format to another, within the anydbm format support restriction. While it might be nice to have some format control, anydbm's purpose is hide the database format/interface. If you care about the format, you're expected to use the desired interface module directly. ---------------------------------------------------------------------- Comment By: Predrag Miocinovic (predragm) Date: 2003-12-21 20:48 Message: Logged In: YES user_id=860222 I find the last comment somewhat unsatisfactory. While this appears to be BerkeleyDB issue (and w/o going into details why the exception gets thrown), it's strange that Shelve module doesn't handle this more gracefully. Since the concept of Shelve is hiding implementation details from the application, having to catch BerkeleyDB exceptions for simple shelf operations is bit over the top. If I move to another system, using different underlying DB (as given by anydbm), will I have to catch new set of exceptions all over again? Shelve (or anydbm) should either provide ability to select underlying DB implementation to use, or it should handle all DB implementation aspects so that it is trully transparent to the end user. Just my $0.02. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2003-12-21 03:50 Message: Logged In: YES user_id=250749 As far as I can make out, what you're seeing is a BerkeleyDB issue, and bsddb is just reporting what BDB is telling it. DB_RUNRECOVERY (-30987 on DB 3.3, -30981 on DB 4.0) is documented as (quoted from DB4.0 HTML docs): "There exists a class of errors that Berkeley DB considers fatal to an entire Berkeley DB environment. An example of this type of error is a corrupted database or a log write failure because the disk is out of free space. The only way to recover from these failures is to have all threads of control exit the Berkeley DB environment, run recovery of the environment, and re-enter Berkeley DB." Therefore I think you should to followup this in a BerkeleyDB forum. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=857909&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com