Re: Million hash comparisons

Jenda Krynicky Thu, 20 Nov 2008 10:29:30 -0800

From: Nitin Kalra <[EMAIL PROTECTED]>
> All I want to do is to compare 2 hashes having 10-20
> Million key/ value pairs, this means 100-200 Million
> comparisons. Hashes are the best thing which come to
> my mind but storing hashes in memory overloads the
> system and lowers the performance drastically.
> 
> For this you are suggesting me to use DB_file module,
> can you plz elaborate how this can be implemented with
> compromising the system performance and having fast
> comparisons.
> 
> Many thanks
> Nitin


The only change to the script will be

 use DB_File;

somewhere near the top and change

my %firsthash;
my %secondhash;

to

tie my %firsthash,  'DB_File', $firstfilename;
tie my %secondhash,  'DB_File', $secondfilename;

a little later, where the two files should not exist beforehand and 
should be if possible on a different disk than the milions of files 
you are comparing.

OTOH, are you sure you need to build both %hashes and THEN compare 
them? I think it should be enough to build one and then as you go 
through the second batch of files compare the data you just generated 
with the %hash.

Jenda
===== [EMAIL PROTECTED] === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed 
to get drunk and croon as much as they like.
        -- Terry Pratchett in Sourcery


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: Million hash comparisons

Reply via email to