On Wed, Sep 1, 2010 at 07:55, Kryten <kryte...@googlemail.com> wrote:
> Hi,
>
> I'm very much a beginner.
>
> Could anyone point me in the right direction on how to accomplish the
> following, please?
>
> I have a fairly long log file call it file A, it has around 20,000
> lines of three element space separated variables.
>
> File A looks like:-
>
> 55223 jimmy smith
> 55224 davy crocket
> 55227 walter mitty
> 63256 mickey mouse
> ..
>
> I also have a .txt file with around 600 of the numbers found in file
> A. Cal that File B.
>
> File B looks like:-
>
> 63256
> 55223
> ..
>
> I need to (quickly as possible) remove all the lines from File A,
> whose numbers can be found in File B.
>
> Now, I have this working just fine in Windows Powershell, but it
> really slow, as I am foreaching file b into a where-object filter, it
> works but takes minutes to run. Based on my limited exposure to perl,
> I'm pretty certain that TMTOWTDI !!
snip

What you need is a hash set.  A hash set is a hash where you only care
about the existence of the keys, not the values associated with those
keys.  If you create a hash set of the items in file B, you can read
each line from file A and see if the field is in the hash, if it is,
don't print it.  This is the fastest it can possibly run; the hash
lookup is amortized O(1) and each file is read only once (which means
it has a runtime of O(m+n) where m is the number of lines in m and n
is the number of lines in n).

#!/usr/bin/perl

use strict;
use warnings;

die "usage: $0 id_file data_file\n" unless @ARGV == 2;

my ($id_file, $data_file) = @ARGV;

open my $ids, "<", $id_file
        or die "could not open $id_file: $!\n";

open my $data, "<", $data_file
        or die "could not open $data_file: $!\n";

my %remove;
while (my $id = <$ids>) {
        chomp $id;
        #using undef instead of 1 because it takes up less room
        $remove{$id} = undef;
}

while (<$data>) {
        my ($id) = split;
        print unless exists $remove{$id};
}


-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to