Robert Brewer wrote:
Martin MOKREJŠ wrote:

Robert Brewer wrote:

Martin MOKREJŠ wrote:


I have sets.Set() objects having up to 20E20 items,
each is composed of up to 20 characters. Keeping
them in memory on !GB machine put's me quickly into swap.
I don't want to use dictionary approach, as I don't see a sense
to store None as a value. The items in a set are unique.

How can I write them efficiently to disk?


got shelve*?

I know about shelve, but doesn't it work like a dictionary? Why should I use shelve for this? Then it's faster to use bsddb directly and use string as a key and None as a value, I'd guess.


If you're using Python 2.3, then a sets.Set *is* implemented with

Yes, I do.

a dictionary, with None values. It simply has some extra methods to
make it behave like a set. In addition, the Set class already has
builtin methods for pickling and unpickling.

Really? Does Set() have such a method to pickle efficiently? I haven't seen it in docs.


So it's probably faster to use bsddb directly, but why not find out by trying 2 lines of code that uses shelve? The time-consuming part

Because I don't know how can I afect indexing using bsddb, for example. For example, create index only for say keysize-1 or keysize-2 chars of a keystring.

How to delay indexing so that index isn't rebuild after every addiotion
of a new key? I want to do it a the end of the loop adding new keys.

Even better, how to turn off indexing completely (to save space)?

of your quest is writing the timed test suite that will indicate
which route will be fastest, which you'll have to do regardless.

Unfortunately, I'm hoping to get first an idea what can be made faster and how when using sets and dictionaries.

M.
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to