Jenda Krynicky wrote:
> From: Nitin Kalra <[EMAIL PROTECTED]>
>>
>> In a Perl script of mine I have to compare 2 8M-10M
>> files(each). Which mean 80-90M searches. As a normal
>> procedure (upto 1 M)I use hashes, but going beyond 1M
>> system performance degrades drastically.
> 
> You mean you compute MD5 (or something similar) hashes of the files 
> and store them in a %hash for the first step of the file comparison? 
> Maybe the %hash grows too big to fit in memory together with the 
> other stuff you need and forces the OS to start paging. If this is 
> the case you may get better performance storing the %hash on disk 
> using DB_File or similar module. Apart from the %hash declaration 
> this should not force any changes to your code, but will drasticaly 
> lower the memory footprint. Though of course for the cases when the 
> %hash would fit in memory this will be slower.

AFAIK, regardless of the size of the source data, calculating an MD5 needs 128
bits plus presumably a secondary copy of them plus a few more. Lets say 64 bytes
to be safe. It's not going to challenge the memory of any current PC.

Rob

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to