Hi everyone, could use some advice on a perl script I wrote using hashes. I have three files ( each file is a list of indexes) my program loads these indexes into hashes and compares the differences and similarities between them. With smaller files it runs fine. problem is I now files have about 88 million records and the script has been running for days. not sure the best way to resolve the issue.(pieces of code samples below) One suggestion that was given to me was to load the first file into a hash and then as I open the next file I immediately do a comparison of the second hash into the 1st one for similarities and differences. I can't really get my head around how to that. Is there a simple way to compare three very large hashes without so much demand on memory? The program manages to loads the large hashes without problems and as I mentioned previously smaller files have no issues. Any suggestions folks have are muchly appreciated. Thanks R
#!/usr/bin/perl use strict; use warnings; ###loading file content into hashes my $filename = '/tmp/test.txt'; open my $fh,"<",$filename or die $!; my %hash = map { /(^ABC*.*?)\n(.*)/} <$fh>; # get the hash size my $hash_size = keys %trim5p; print "The hash contains $hash_size elements.\n"; close $fh or die $!; ######intersect subroutine sub intersection { my ($hasha, $hashb) = @_; my %newhash; foreach my $key (keys %{$hasha}) { $newhash{$key} = $$hasha{$key} if (exists $$hashb{$key}); } # dont return %newhash just grab size my $newhash_size = keys %newhash; print "The intersected hash contains $newhash_size elements.\n"; } ################differences between hashes sub in_one_not_in_both { #Find keys from one hash that aren't in both my ($hash3, $hash4) = @_; my %newhash2; foreach my $key2 (keys %{$hash3}) { $newhash2{$key2} = $$hash3{$key2} unless (exists $$hash4{$key2}); } my $newhash_size2 = keys %newhash2; print "This is the number of unique values: $newhash_size2 \n"; }