Istvan Albert wrote:
Martin MOKREJÅ wrote:


But nevertheless, imagine 1E6 words of size 15. That's maybe 1.5GB of raw
data. Will sets be appropriate you think?


You started out with 20E20 then cut back to 1E15 keys
now it is down to one million but you claim that these
will take 1.5 GB.

I gave up the theoretical approach. Practically, I might need up to store maybe those 1E15 keys.

So you say 1 million words is better to store in dictionary than
in a set and use your own function to get out those unique or common
words?


On my system storing 1 million words of length 15 as keys of a python dictionary is around 75MB.

Fine, that's what I wanted to hear. How do you improve the algorithm? Do you delay indexing to the very latest moment or do you let your computer index 999 999 times just for fun?


I.

-- http://mail.python.org/mailman/listinfo/python-list

Reply via email to