On Wed, Sep 1, 2010 at 07:55, Kryten <kryte...@googlemail.com> wrote: > Hi, > > I'm very much a beginner. > > Could anyone point me in the right direction on how to accomplish the > following, please? > > I have a fairly long log file call it file A, it has around 20,000 > lines of three element space separated variables. > > File A looks like:- > > 55223 jimmy smith > 55224 davy crocket > 55227 walter mitty > 63256 mickey mouse > .. > > I also have a .txt file with around 600 of the numbers found in file > A. Cal that File B. > > File B looks like:- > > 63256 > 55223 > .. > > I need to (quickly as possible) remove all the lines from File A, > whose numbers can be found in File B. > > Now, I have this working just fine in Windows Powershell, but it > really slow, as I am foreaching file b into a where-object filter, it > works but takes minutes to run. Based on my limited exposure to perl, > I'm pretty certain that TMTOWTDI !! snip
What you need is a hash set. A hash set is a hash where you only care about the existence of the keys, not the values associated with those keys. If you create a hash set of the items in file B, you can read each line from file A and see if the field is in the hash, if it is, don't print it. This is the fastest it can possibly run; the hash lookup is amortized O(1) and each file is read only once (which means it has a runtime of O(m+n) where m is the number of lines in m and n is the number of lines in n). #!/usr/bin/perl use strict; use warnings; die "usage: $0 id_file data_file\n" unless @ARGV == 2; my ($id_file, $data_file) = @ARGV; open my $ids, "<", $id_file or die "could not open $id_file: $!\n"; open my $data, "<", $data_file or die "could not open $data_file: $!\n"; my %remove; while (my $id = <$ids>) { chomp $id; #using undef instead of 1 because it takes up less room $remove{$id} = undef; } while (<$data>) { my ($id) = split; print unless exists $remove{$id}; } -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/