Re: Dictionary or Database—Please advise

mk Fri, 26 Feb 2010 10:23:36 -0800

Jeremy wrote:

Shelve looks like an interesting option, but what might pose an issue
is that I'm reading the data from a disk instead of memory.  I didn't
mention this in my original post, but I was hoping that by using a
database it would be more memory efficient in storing data in RAM so I
wouldn't have to read from (or swap to/from) disk.  Would using the
shelve package make reading/writing data from disk faster since it is
in a binary format?


Read the docs:

"class shelve.BsdDbShelf(dict[, protocol=None[, writeback=False]])¶

A subclass of Shelf which exposes first(), next(), previous(),last() and set_location() which are available in the bsddb module butnot in other database modules. The dict object passed to the constructormust support those methods. This is generally accomplished by callingone of bsddb.hashopen(), bsddb.btopen() or bsddb.rnopen(). The optionalprotocol and writeback parameters have the same interpretation as forthe Shelf class."

Apparently using shelve internally gives you option of using bsddb,which is good news: bsddb is B-tree DB, which is highly efficient forfinding keys. I would recommend bsddb.btopen(), as it creates B-tree DB(perhaps other implementations, like anydb or hash db are good as well,but I personally didn't test them out).

I can't say for Berkeley DB implementation, but in general B-treealgorithm has O(log2 n) complexity for finding keys, which roughly meansthat if you need to find particular key in a db of 1 million keys,you'll probably need ~20 disk accesses (or even less if some keys lookedat in the process of search happen to be in the same disk sectors). Soyes, it's highly efficient.

Having said that, remember that disk is many orders of magnitude slowerthan RAM, so it's no free lunch.. Nothing will beat memory-based datastructure when it comes to speed (well new flash or hybrid disks perhapscould significantly improve in comparison to current mainstreammechanical-platter disks? there are some hyper-fast storage hardwarecompanies out there, although they tend to charge arm and leg for theirstuff for now).

Caveat: Berkeley DB is dual-licensed -- if you're using it forcommercial work, it might be that you'd need to buy a license for it.Although I have had no experience with this really, if someone here didperhaps they will shed some light on it?


Regards,
mk



--
http://mail.python.org/mailman/listinfo/python-list

Re: Dictionary or Database—Please advise

Reply via email to