Tim Peters wrote:
[Martin MOKREJÅ]
This comm(1) approach doesn't work for me. It somehow fails to
detect common entries when the offset is too big.
[...]
I'll repeat:
As I mentioned before, if you store keys in sorted text files ...
Those files aren't in sorted order, so of cour
Tim Peters wrote:
[Martin MOKREJÅ]
...
I gave up the theoretical approach. Practically, I might need up
to store maybe those 1E15 keys.
We should work on our multiplication skills here . You don't
have enough disk space to store 1E15 keys. If your keys were just one
byte each, you would ne
Tim Peters wrote:
[Tim Peters]
As I mentioned before, if you store keys in sorted text files,
you can do intersection and difference very efficiently just by using
the Unix `comm` utiltity.
[Martin MOKREJÅ]
Now I got your point. I understand the comm(1) is written in C, but it still
has to scan
t the proposed code really does.
Scott David Daniels wrote:
Tim Peters wrote:
[Martin MOKREJÅ]
just imagine, you want to compare how many words are in English, German,
Czech, Polish disctionary. You collect words from every language and
record
them in dict or Set, as you wish.
Call the set of all En
Tim Peters wrote:
[Martin MOKREJÅ]
...
I gave up the theoretical approach. Practically, I might need up
to store maybe those 1E15 keys.
We should work on our multiplication skills here . You don't
have enough disk space to store 1E15 keys. If your keys were just one
byte each, you would ne
Istvan Albert wrote:
Martin MOKREJÅ wrote:
But nevertheless, imagine 1E6 words of size 15. That's maybe 1.5GB of raw
data. Will sets be appropriate you think?
You started out with 20E20 then cut back to 1E15 keys
now it is down to one million but you claim that these
will take 1.5 GB.
I ga
Tim Peters wrote:
[Martin MOKREJÅ]
just imagine, you want to compare how many words are in English, German,
Czech, Polish disctionary. You collect words from every language and record
them in dict or Set, as you wish.
Call the set of all English words E; G, C, and P similarly.
Once you have
Paul McGuire wrote:
"Martin MOKREJÂ" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
Hi,
I have sets.Set() objects having up to 20E20 items,
each is composed of up to 20 characters. Keeping
them in memory on !GB machine put's me quickly into swap.
I don't want to use dictionary approac