Package: perforate Version: 1.2-5 Severity: important Hi
The subject say all - finddup eats all memory. Looking into it its the Digest::MD5 sum usage of finddup, seems the author never tested with large files, or lots of files (worth a terabyte of disc space). Kicking the "addfile" and instead doing a while loop has two effects: - does not eat any noticable amount of memory anymore. (other than whats needed for the file list) - is noticable faster, for whatever reason.
--- /usr/bin/finddup 2006-08-18 23:09:57.000000000 +0200
+++ /home/joerg/finddup 2007-11-07 00:33:01.827142588 +0100
@@ -131,11 +131,19 @@
sub insert_md5
{
my $file = shift;
+ my $data;
+
if (open(IN, "<", $file->[4]->[0]))
{
- my $md5 = Digest::MD5->new->addfile(*IN)->hexdigest;
- $md5 .= "\t".$file->[1]."\t".$file->[2]."\t".$file->[3] unless $opt->{'ignore-perms'};
+ my $check = Digest::MD5->new;
+ while (sysread(IN, $data, 8192))
+ {
+ $check->add($data);
+ }
close IN;
+ my $md5 = $check->hexdigest;
+
+ $md5 .= "\t".$file->[1]."\t".$file->[2]."\t".$file->[3] unless $opt->{'ignore-perms'};
$md5list{$md5} = [] unless exists $md5list{$md5};
push @{$md5list{$md5}}, $file;
}
-- bye Joerg Some AM to his NM on [11 Aug. 2004]: You already won't get through Front Desk and Account Manager approvals before sarge,[...] [Note: He made it! :) ]
pgpvBLPFbMDPJ.pgp
Description: PGP signature

