New submission from johansen: We've been using Python 2.4 to build the new package management software for OpenSolaris. We use a ndbm database to hold keywords about packages, and found that each time we added a new OpenSolaris build to our package repository, the time to import would increase by about 1/3 of the previous time.
It turns out that we were continually invoking a function in the dbmmodule that walked the entire database every time the function was called. Looking at dbmmodule.c, the source for dbm.so, is instructive: This is dbm_length, the function that we're _always_ calling. static int dbm_length(dbmobject *dp) { if (dp->di_dbm == NULL) { PyErr_SetString(DbmError, "DBM object has already been closed"); return -1; } if ( dp->di_size < 0 ) { datum key; int size; size = 0; for ( key=dbm_firstkey(dp->di_dbm); key.dptr; key = dbm_nextkey(dp->di_dbm)) size++; dp->di_size = size; } return dp->di_size; } It's a knock-off of function shown in ndbm(3C) that traverses the database. It looks like this function walks every record in the database, and then returns that as its size. Further examination of dbmmodule shows that dbm_length has been assigned as the function for the inquiry operator: static PyMappingMethods dbm_as_mapping = { (inquiry)dbm_length, /*mp_length*/ (binaryfunc)dbm_subscript, /*mp_subscript*/ (objobjargproc)dbm_ass_sub, /*mp_ass_subscript*/ }; It looks like dbm_length stashes the size of the database, so it doesn't always have to traverse it. However, further examination of the source shows that an insertion doesn't update the di_size counter. Worse yet, an update or a delete will cause the counter to be set to -1. This means that the next call to dbm_length will have to walk the entire database all over again. Ick. One of the problem parts of the code is this line in catalog.py: update_searchdb(): if fmri_list: if not self.searchdb: self.searchdb = \ dbm.open(self.searchdb_file, "c") This if not triggers the PyObject_IsTrue that invokes the inquiry operator for the dbm module. Every time we run this code, we're going to walk the entire database. By changing this to: if fmri_list: if self.searchdb is None: self.searchdb = \ dbm.open(self.searchdb_file, "c") We were able to work around the problem by using the is None check, instead of if not self.searchdb; however, this seems like it is really a problem with the dbmmodule and should ultimately be resolved there. ---------- components: Extension Modules messages: 62668 nosy: johansen severity: normal status: open title: dbmmodule inquiry function is performance prohibitive type: resource usage versions: Python 2.4 __________________________________ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2159> __________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com