Jenda Krynicky wrote: > From: Nitin Kalra <[EMAIL PROTECTED]> >> >> In a Perl script of mine I have to compare 2 8M-10M >> files(each). Which mean 80-90M searches. As a normal >> procedure (upto 1 M)I use hashes, but going beyond 1M >> system performance degrades drastically. > > You mean you compute MD5 (or something similar) hashes of the files > and store them in a %hash for the first step of the file comparison? > Maybe the %hash grows too big to fit in memory together with the > other stuff you need and forces the OS to start paging. If this is > the case you may get better performance storing the %hash on disk > using DB_File or similar module. Apart from the %hash declaration > this should not force any changes to your code, but will drasticaly > lower the memory footprint. Though of course for the cases when the > %hash would fit in memory this will be slower.
AFAIK, regardless of the size of the source data, calculating an MD5 needs 128 bits plus presumably a secondary copy of them plus a few more. Lets say 64 bytes to be safe. It's not going to challenge the memory of any current PC. Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/