Re: Writing huge Sets() to disk

2005-01-14 Thread Martin MOKREJÅ
Tim Peters wrote: [Martin MOKREJÅ] This comm(1) approach doesn't work for me. It somehow fails to detect common entries when the offset is too big. [...] I'll repeat: As I mentioned before, if you store keys in sorted text files ... Those files aren't in sorted order, so of cour

Re: Writing huge Sets() to disk

2005-01-14 Thread Martin MOKREJÅ
Tim Peters wrote: [Martin MOKREJÅ] ... I gave up the theoretical approach. Practically, I might need up to store maybe those 1E15 keys. We should work on our multiplication skills here . You don't have enough disk space to store 1E15 keys. If your keys were just one byte each, you would ne

Re: Writing huve ge Sets() to disk

2005-01-10 Thread Martin MOKREJÅ
Tim Peters wrote: [Tim Peters] As I mentioned before, if you store keys in sorted text files, you can do intersection and difference very efficiently just by using the Unix `comm` utiltity. [Martin MOKREJÅ] Now I got your point. I understand the comm(1) is written in C, but it still has to scan

Re: Writing huge Sets() to disk

2005-01-10 Thread Martin MOKREJÅ
t the proposed code really does. Scott David Daniels wrote: Tim Peters wrote: [Martin MOKREJÅ] just imagine, you want to compare how many words are in English, German, Czech, Polish disctionary. You collect words from every language and record them in dict or Set, as you wish. Call the set of all En

Re: Writing huge Sets() to disk

2005-01-10 Thread Martin MOKREJÅ
Tim Peters wrote: [Martin MOKREJÅ] ... I gave up the theoretical approach. Practically, I might need up to store maybe those 1E15 keys. We should work on our multiplication skills here . You don't have enough disk space to store 1E15 keys. If your keys were just one byte each, you would ne

Re: Writing huge Sets() to disk

2005-01-10 Thread Martin MOKREJÅ
Istvan Albert wrote: Martin MOKREJÅ wrote: But nevertheless, imagine 1E6 words of size 15. That's maybe 1.5GB of raw data. Will sets be appropriate you think? You started out with 20E20 then cut back to 1E15 keys now it is down to one million but you claim that these will take 1.5 GB. I ga

Re: Writing huge Sets() to disk

2005-01-10 Thread Martin MOKREJÅ
Tim Peters wrote: [Martin MOKREJÅ] just imagine, you want to compare how many words are in English, German, Czech, Polish disctionary. You collect words from every language and record them in dict or Set, as you wish. Call the set of all English words E; G, C, and P similarly. Once you have

Re: Writing huge Sets() to disk

2005-01-10 Thread Martin MOKREJÅ
Paul McGuire wrote: "Martin MOKREJÂ" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] Hi, I have sets.Set() objects having up to 20E20 items, each is composed of up to 20 characters. Keeping them in memory on !GB machine put's me quickly into swap. I don't want to use dictionary approac