Hi everyone, could use some advice on a perl script I wrote using hashes. I
have three files ( each file is a list of indexes) my program loads these
indexes into hashes and compares the differences and similarities between
them. With smaller files it runs fine. problem is I now files have about 88
million records and the script has been running for days. not sure the best
way to resolve the issue.(pieces of code samples below) One suggestion that
was given to me was to load the first file into a hash and then as I open
the next file I immediately do a comparison of the second hash into the 1st
one for similarities and differences. I can't really get my head around how
to that. Is there a simple way to compare three very large hashes without so
much demand on memory? The program manages to loads the large hashes without
problems and as I mentioned previously smaller files have no issues. Any
suggestions folks have are muchly appreciated.
Thanks R

#!/usr/bin/perl
use strict;
use warnings;


###loading file content into hashes

my $filename = '/tmp/test.txt';
open my $fh,"<",$filename or die $!;

my %hash = map { /(^ABC*.*?)\n(.*)/} <$fh>;

# get the hash size
my $hash_size = keys %trim5p;

print "The hash contains $hash_size elements.\n";

close $fh or die $!;

######intersect subroutine
sub intersection
{
   my ($hasha, $hashb) = @_;
   my %newhash;
   foreach my $key (keys %{$hasha})
   {
      $newhash{$key} = $$hasha{$key} if (exists $$hashb{$key});
   }

  # dont return %newhash just grab size

my $newhash_size = keys %newhash;

print "The intersected hash contains $newhash_size elements.\n";

}

################differences between hashes

sub in_one_not_in_both
{
#Find keys from one hash that aren't in both

my ($hash3, $hash4) = @_;
my %newhash2;


foreach my $key2 (keys %{$hash3})
{
    $newhash2{$key2} = $$hash3{$key2} unless (exists $$hash4{$key2});

}

my $newhash_size2 = keys %newhash2;

print "This is the number of unique values: $newhash_size2 \n";

}

Reply via email to