Paul McGuire wrote:
"Martin MOKREJÂ" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
Hi,
I have sets.Set() objects having up to 20E20 items,
each is composed of up to 20 characters. Keeping
them in memory on !GB machine put's me quickly into swap.
I don't want to use dictionary approach, as I don't see a sense
to store None as a value. The items in a set are unique.
How can I write them efficiently to disk? To be more exact,
I have 20 sets. _set1 has 1E20 keys of size 1 character.
alphabet = ('G', 'A', 'V', 'L', 'I', 'P', 'S', 'T', 'C', 'M', 'A', 'Q',
'F', 'Y', 'W', 'K', 'R', 'H', 'D', 'E')
for aa1 in alphabet:
# l = [aa1]
#_set1.add(aa1)
for aa2 in alphabet:
# l.append(aa2)
#_set2.add(''.join(l))
[cut]
The reason I went for sets instead of lists is the speed,
availability of unique, common and other methods.
What would you propose as an elegant solution?
Actually, even those nested for loops take ages. :(
M.
_set1 only has 19 keys of size 1 character - 'A' is duplicated.
Ahh, you are right. That's a typo, yet I have to figure out which char
I have to use. But 'Z' will do for the tests anyway.
Assuming you replace 'A' with another character (say 'Z'), then here is what
you get:
_set1 - 20 elements (20**1)
_set2 - 400 elements (20**2)
_set3 - 8000 elements (20**3)
...
_set20 - 20**20 ~ 10 ^ (1.301*20) or 1E26
If you have no duplicates in your alphabet, then you wont need to use sets,
every combination will be unique. In this case, just use a generator.
As I noted in later response, actually I will compare these ideal sets to
some
real world examples. I have no expectation how many entries will be common
and how many unique. The only thing I know even in real world, I might be in
size ranges of 2E6 or perhaps 2E8?
Here's a recursive generator approach that may save you a bunch of nested
editing (set maxDepth as high as you dare):
alphabet = ('G', 'A', 'V', 'L', 'I', 'P', 'S', 'T', 'C', 'M', 'Z', 'Q', 'F',
'Y', 'W', 'K', 'R', 'H', 'D', 'E')
maxDepth = 3
def genNextCombo(root=list(),depth=1):
for a in alphabet:
thisRoot = root + [a]
yield "".join( thisRoot )
if depth < maxDepth:
for c in genNextCombo(thisRoot, depth+1):
yield c
for c in genNextCombo():
print c # or write to file, or whatever
Works nice, but http://www.python.org/doc/2.3.4/ref/yield.html
doesn't really tell what happens here. :( But is quite fast and
definitely saves those the stupid recursively hand-written for
loops.
Thanks Paul!
M.
--
http://mail.python.org/mailman/listinfo/python-list