Re: Writing huge Sets() to disk

2005-01-17 Thread Duncan Booth
Martin MOKREJŠ wrote: > Duncan Booth wrote: >> Almost anything you do copies references. > > > But what does this?: > > x = 'x' Copies a reference to the existing string 'x' and stores the new reference in the variable x. If its a global variable then this will involve creating a new

Re: Writing huge Sets() to disk

2005-01-17 Thread Martin MOKREJŠ
Steve Holden wrote: Martin MOKREJŠ wrote: Hi, could someone tell me what all does and what all doesn't copy references in python. I have found my script after reaching some state and taking say 600MB, pushes it's internal dictionaries to hard disk. The for loop consumes another 300MB (as gathered

Re: Writing huge Sets() to disk

2005-01-17 Thread Martin MOKREJŠ
Duncan Booth wrote: Martin MOKREJ© wrote: Hi, could someone tell me what all does and what all doesn't copy references in python. I have found my script after reaching some state and taking say 600MB, pushes it's internal dictionaries to hard disk. The for loop consumes another 300MB (as gathered

Re: Writing huge Sets() to disk

2005-01-17 Thread Steve Holden
Martin MOKREJŠ wrote: Hi, could someone tell me what all does and what all doesn't copy references in python. I have found my script after reaching some state and taking say 600MB, pushes it's internal dictionaries to hard disk. The for loop consumes another 300MB (as gathered by vmstat) to push t

Re: Writing huge Sets() to disk

2005-01-17 Thread Duncan Booth
Martin MOKREJ© wrote: > Hi, > could someone tell me what all does and what all doesn't copy > references in python. I have found my script after reaching some > state and taking say 600MB, pushes it's internal dictionaries > to hard disk. The for loop consumes another 300MB (as gathered > by vms

Re: Writing huge Sets() to disk

2005-01-17 Thread Martin MOKREJŠ
Hi, could someone tell me what all does and what all doesn't copy references in python. I have found my script after reaching some state and taking say 600MB, pushes it's internal dictionaries to hard disk. The for loop consumes another 300MB (as gathered by vmstat) to push the data to dictionarie

Re: Writing huge Sets() to disk

2005-01-14 Thread Martin MOKREJÅ
Tim Peters wrote: [Martin MOKREJÅ] This comm(1) approach doesn't work for me. It somehow fails to detect common entries when the offset is too big. [...] I'll repeat: As I mentioned before, if you store keys in sorted text files ... Those files aren't in sorted order, so of course `comm` can't do

Re: Writing huge Sets() to disk

2005-01-14 Thread Tim Peters
[Martin MOKREJÅ] > This comm(1) approach doesn't work for me. It somehow fails to > detect common entries when the offset is too big. > > file 1: > > A > F > G > I > K > M > N > R > V > AA > AI > FG > FR > GF > GI > GR > IG > IK > IN > IV > KI > MA > NG > RA > RI > VF > AIK > FGR > FRA > GFG > GIN

Re: Writing huge Sets() to disk

2005-01-14 Thread Martin MOKREJÅ
Tim Peters wrote: [Martin MOKREJÅ] ... I gave up the theoretical approach. Practically, I might need up to store maybe those 1E15 keys. We should work on our multiplication skills here . You don't have enough disk space to store 1E15 keys. If your keys were just one byte each, you would need to

Re: Writing huge Sets() to disk

2005-01-11 Thread Bengt Richter
On Mon, 10 Jan 2005 17:11:09 +0100, =?ISO-8859-2?Q?Martin_MOKREJ=A9?= <[EMAIL PROTECTED]> wrote: >Hi, > I have sets.Set() objects having up to 20E20 items, What notation are you using when you write 20E20? IOW, ISTM 1E9 is a billion. So 20E20 would be 2000 billion billion. Please clarify ;-) >e

Re: Writing huge Sets() to disk

2005-01-10 Thread Tim Peters
[Istvan Albert] > #- I think that you need to first understand how dictionaries work. > #- The time needed to insert a key is independent of > #- the number of values in the dictionary. [Batista, Facundo] > Are you sure? > > I think that is true while the hashes don't collide. If you have co

Re: Writing huge Sets() to disk

2005-01-10 Thread Paul Rubin
=?windows-1252?Q?Martin_MOKREJ=8A?= <[EMAIL PROTECTED]> writes: > Yes, I'm. I still don't get what that acronym CLRS stands for ... :( CLRS = the names of the authors, Cormen, Leiserson, Rivest, and Stein, if I spelled those correctly. :) -- http://mail.python.org/mailman/listinfo/python-list

Re: Writing huge Sets() to disk

2005-01-10 Thread Martin MOKREJŠ
Paul Rubin wrote: Paul Rubin writes: handle with builtin Python operations without putting some thought into algorithms and data structures. From "ribosome" I'm guessing you're doing computational biology. If you're going to be writing Well, trying sort of ... Not much

Re: Writing huge Sets() to disk

2005-01-10 Thread Paul Rubin
Paul Rubin writes: > handle with builtin Python operations without putting some thought > into algorithms and data structures. From "ribosome" I'm guessing > you're doing computational biology. If you're going to be writing > code for these kinds of problems on a regula

Re: Writing huge Sets() to disk

2005-01-10 Thread Paul Rubin
Martin MOKREJ¦ <[EMAIL PROTECTED]> writes: > >> I have sets.Set() objects having up to 20E20 items, > just imagine, you want to compare how many words are in English, German, > Czech, Polish disctionary. You collect words from every language and record > them in dict or Set, as you wish. > >

Re: Writing huge Sets() to disk

2005-01-10 Thread John Lenton
On Tue, Jan 11, 2005 at 12:33:42AM +0200, Simo Melenius wrote: > "John Lenton" <[EMAIL PROTECTED]> writes: > > > you probably want to look into building set-like objects ontop of > > tries, given the homogeneity of your language. You should see > > imrpovements both in size and speed. > > Ternary

Re: Writing huge Sets() to disk

2005-01-10 Thread Scott David Daniels
Martin MOKREJÅ wrote: But I don't think I can use one-way hashes, as I need to reconstruct the string later. I have to study hard to get an idea what the proposed > code really does. Scott David Daniels wrote: Tim Peters wrote: Call the set of all English words E; G, C, and P similarly.

Re: Writing huge Sets() to disk

2005-01-10 Thread Martin MOKREJŠ
Simo Melenius wrote: "John Lenton" <[EMAIL PROTECTED]> writes: you probably want to look into building set-like objects ontop of tries, given the homogeneity of your language. You should see imrpovements both in size and speed. Ternary search trees give _much_ better space-efficiency compared to

Re: Writing huge Sets() to disk

2005-01-10 Thread Simo Melenius
"John Lenton" <[EMAIL PROTECTED]> writes: > you probably want to look into building set-like objects ontop of > tries, given the homogeneity of your language. You should see > imrpovements both in size and speed. Ternary search trees give _much_ better space-efficiency compared to tries, at the e

Re: Writing huge Sets() to disk

2005-01-10 Thread Martin MOKREJÅ
Dear Scott, thatnk you for you excellent email. I also thought about using some zip() function to compress the strings first before using them as keys in a dict. But I don't think I can use one-way hashes, as I need to reconstruct the string later. I have to study hard to get an idea what the prop

Re: Writing huge Sets() to disk

2005-01-10 Thread Scott David Daniels
Tim Peters wrote: [Martin MOKREJÅ] just imagine, you want to compare how many words are in English, German, Czech, Polish disctionary. You collect words from every language and record them in dict or Set, as you wish. Call the set of all English words E; G, C, and P similarly. Once you have those S

Re: Writing huge Sets() to disk

2005-01-10 Thread Martin MOKREJÅ
Tim Peters wrote: [Martin MOKREJÅ] ... I gave up the theoretical approach. Practically, I might need up to store maybe those 1E15 keys. We should work on our multiplication skills here . You don't have enough disk space to store 1E15 keys. If your keys were just one byte each, you would need to

RE: Writing huge Sets() to disk

2005-01-10 Thread Batista, Facundo
Title: RE: Writing huge Sets() to disk [Istvan Albert] #- I think that you need to first understand how dictionaries work. #- The time needed to insert a key is independent of #- the number of values in the dictionary. Are you sure? I think that is true while the hashes don't co

Re: Writing huge Sets() to disk

2005-01-10 Thread John Lenton
you probably want to look into building set-like objects ontop of tries, given the homogeneity of your language. You should see imrpovements both in size and speed. -- http://mail.python.org/mailman/listinfo/python-list

Re: Writing huge Sets() to disk

2005-01-10 Thread Tim Peters
[Martin MOKREJÅ] > ... > > I gave up the theoretical approach. Practically, I might need up > to store maybe those 1E15 keys. We should work on our multiplication skills here . You don't have enough disk space to store 1E15 keys. If your keys were just one byte each, you would need to have 4 th

Re: Writing huge Sets() to disk

2005-01-10 Thread Istvan Albert
Martin MOKREJÅ wrote: Istvan Albert wrote: So you say 1 million words is better to store in dictionary than in a set and use your own function to get out those unique or common words? I have said nothing even remotely like that. Fine, that's what I wanted to hear. How do you improve the algorithm?

Re: Writing huge Sets() to disk

2005-01-10 Thread Martin MOKREJÅ
Istvan Albert wrote: Martin MOKREJÅ wrote: But nevertheless, imagine 1E6 words of size 15. That's maybe 1.5GB of raw data. Will sets be appropriate you think? You started out with 20E20 then cut back to 1E15 keys now it is down to one million but you claim that these will take 1.5 GB. I gave up t

Re: Writing huge Sets() to disk

2005-01-10 Thread Istvan Albert
Martin MOKREJÅ wrote: But nevertheless, imagine 1E6 words of size 15. That's maybe 1.5GB of raw data. Will sets be appropriate you think? You started out with 20E20 then cut back to 1E15 keys now it is down to one million but you claim that these will take 1.5 GB. On my system storing 1 million wo

Re: Writing huge Sets() to disk

2005-01-10 Thread Martin MOKREJÅ
Tim Peters wrote: [Martin MOKREJÅ] just imagine, you want to compare how many words are in English, German, Czech, Polish disctionary. You collect words from every language and record them in dict or Set, as you wish. Call the set of all English words E; G, C, and P similarly. Once you have those

Re: Writing huge Sets() to disk

2005-01-10 Thread Tim Peters
[Martin MOKREJÅ] > just imagine, you want to compare how many words are in English, German, > Czech, Polish disctionary. You collect words from every language and record > them in dict or Set, as you wish. Call the set of all English words E; G, C, and P similarly. > Once you have those Set's o

Re: Writing huge Sets() to disk

2005-01-10 Thread Martin MOKREJŠ
Adam DePrince wrote: On Mon, 2005-01-10 at 11:11, Martin MOKREJ¦ wrote: Hi, I have sets.Set() objects having up to 20E20 items, each is composed of up to 20 characters. Keeping them in memory on !GB machine put's me quickly into swap. I don't want to use dictionary approach, as I don't see a sense

Re: Writing huge Sets() to disk

2005-01-10 Thread Martin MOKREJŠ
Robert Brewer wrote: Martin MOKREJŠ wrote: Robert Brewer wrote: Martin MOKREJŠ wrote: I have sets.Set() objects having up to 20E20 items, each is composed of up to 20 characters. Keeping them in memory on !GB machine put's me quickly into swap. I don't want to use dictionary approach, as I don't s

Re: Writing huge Sets() to disk

2005-01-10 Thread Adam DePrince
On Mon, 2005-01-10 at 11:11, Martin MOKREJ¦ wrote: > Hi, > I have sets.Set() objects having up to 20E20 items, > each is composed of up to 20 characters. Keeping > them in memory on !GB machine put's me quickly into swap. > I don't want to use dictionary approach, as I don't see a sense > to stor

RE: Writing huge Sets() to disk

2005-01-10 Thread Robert Brewer
Martin MOKREJŠ wrote: > Robert Brewer wrote: > > Martin MOKREJŠ wrote: > > > >> I have sets.Set() objects having up to 20E20 items, > >>each is composed of up to 20 characters. Keeping > >>them in memory on !GB machine put's me quickly into swap. > >>I don't want to use dictionary approach, as I

Re: Writing huge Sets() to disk

2005-01-10 Thread Martin MOKREJÅ
Paul McGuire wrote: "Martin MOKREJÂ" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] Hi, I have sets.Set() objects having up to 20E20 items, each is composed of up to 20 characters. Keeping them in memory on !GB machine put's me quickly into swap. I don't want to use dictionary approac

Re: Writing huge Sets() to disk

2005-01-10 Thread Martin MOKREJŠ
Robert Brewer wrote: Martin MOKREJŠ wrote: I have sets.Set() objects having up to 20E20 items, each is composed of up to 20 characters. Keeping them in memory on !GB machine put's me quickly into swap. I don't want to use dictionary approach, as I don't see a sense to store None as a value. The it

Re: Writing huge Sets() to disk

2005-01-10 Thread Paul McGuire
"Martin MOKREJ©" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Hi, > I have sets.Set() objects having up to 20E20 items, > each is composed of up to 20 characters. Keeping > them in memory on !GB machine put's me quickly into swap. > I don't want to use dictionary approach, as I d

Re: Writing huge Sets() to disk

2005-01-10 Thread Martin MOKREJŠ
Batista, Facundo wrote: [Martin MOKREJŠ] #- > At least you'll need a disk of 34694 EXABYTES!!! #- #- Hmm, you are right. So 20E15 then? I definitely need to be Right. Now you only need 355 PETABytes. Nowadays disk is cheap, but... #- in range 1-14. ;-) Why? I need to test for occurence every such

RE: Writing huge Sets() to disk

2005-01-10 Thread Batista, Facundo
Title: RE: Writing huge Sets() to disk [Martin MOKREJ?] #-   I have sets.Set() objects having up to 20E20 items, #- each is composed of up to 20 characters. Keeping Are you really sure?? #-   How can I write them efficiently to disk? To be more exact, I think that there's

RE: Writing huge Sets() to disk

2005-01-10 Thread Batista, Facundo
Title: RE: Writing huge Sets() to disk [Martin MOKREJŠ] #- > At least you'll need a disk of 34694 EXABYTES!!! #- #- Hmm, you are right. So 20E15 then? I definitely need to be Right. Now you only need 355 PETABytes. Nowadays disk is cheap, but... #- in range 1-14. ;

RE: Writing huge Sets() to disk

2005-01-10 Thread Robert Brewer
Martin MOKREJŠ wrote: > I have sets.Set() objects having up to 20E20 items, > each is composed of up to 20 characters. Keeping > them in memory on !GB machine put's me quickly into swap. > I don't want to use dictionary approach, as I don't see a sense > to store None as a value. The items in a s

Re: Writing huge Sets() to disk

2005-01-10 Thread Martin MOKREJŠ
Batista, Facundo wrote: [Martin MOKREJ?] #- I have sets.Set() objects having up to 20E20 items, #- each is composed of up to 20 characters. Keeping Are you really sure?? Either I'll have to construct them all over again say 20-30 times, or I'll find a way to keep them on disk. #- How can I wri

Writing huge Sets() to disk

2005-01-10 Thread Martin MOKREJŠ
Hi, I have sets.Set() objects having up to 20E20 items, each is composed of up to 20 characters. Keeping them in memory on !GB machine put's me quickly into swap. I don't want to use dictionary approach, as I don't see a sense to store None as a value. The items in a set are unique. How can I wri