Re: Million hash comparisons

Jenda Krynicky Wed, 19 Nov 2008 15:23:47 -0800

From: Nitin Kalra <[EMAIL PROTECTED]>
> Hi Community,
> 
> In a Perl script of mine I have to compare 2 8M-10M
> files(each). Which mean 80-90M searches. As a normal
> procedure (upto 1 M)I use hashes, but going beyond 1M
> system performance degrades drastically.


You mean you compute MD5 (or something similar) hashes of the files 
and store them in a %hash for the first step of the file comparison? 
Maybe the %hash grows too big to fit in memory together with the 
other stuff you need and forces the OS to start paging. If this is 
the case you may get better performance storing the %hash on disk 
using DB_File or similar module. Apart from the %hash declaration 
this should not force any changes to your code, but will drasticaly 
lower the memory footprint. Though of course for the cases when the 
%hash would fit in memory this will be slower.

HTH, Jenda
===== [EMAIL PROTECTED] === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed 
to get drunk and croon as much as they like.
        -- Terry Pratchett in Sourcery


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: Million hash comparisons

Reply via email to