On 05/04/12 12:22, Steve Howell wrote: > Which variant do you recommend? > > """ anydbm is a generic interface to variants of the DBM database > — dbhash (requires bsddb), gdbm, or dbm. If none of these modules > is installed, the slow-but-simple implementation in module > dumbdbm will be used. > > """
If you use the stock anydbm module, it automatically chooses the best it knows from the ones available: import os import hashlib import random from string import letters import anydbm KB = 1024 MB = KB * KB GB = MB * KB DESIRED_SIZE = 1 * GB KEYS_TO_SAMPLE = 20 FNAME = "mydata.db" i = 0 md5 = hashlib.md5() db = anydbm.open(FNAME, 'c') try: print("Generating junk data...") while os.path.getsize(FNAME) < 6*GB: key = md5.update(str(i))[:16] size = random.randrange(1*KB, 4*KB) value = ''.join(random.choice(letters) for _ in range(size)) db[key] = value i += 1 print("Gathering %i sample keys" % KEYS_TO_SAMPLE) keys_of_interest = random.sample(db.keys(), KEYS_TO_SAMPLE) finally: db.close() print("Reopening for a cold sample set in case it matters") db = anydbm.open(FNAME) try: print("Performing %i lookups") for key in keys_of_interest: v = db[key] print("Done") finally: db.close() (your specs said ~6gb of data, keys up to 16 characters, values of 1k-4k, so this should generate such data) -tkc -- http://mail.python.org/mailman/listinfo/python-list