On Mar 26, 12:27 pm, [EMAIL PROTECTED] (Tom Phoenix) wrote:
> On Wed, Mar 26, 2008 at 8:18 AM,  <[EMAIL PROTECTED]> wrote:
> > I have two sorted files (one string per line).
> >  [I'd also like to know how to sorvle this if the lists weren't sorted
> >  (as complimented sets)].]
> >  I want to output the List1 items not found in the List2 file.
> >  grep is too slow.
> >  diff gets stuck because list2 has millions of items.
>
> If the lists aren't sorted, it's probably best to read the second list
> (the list of filters) into a hash. But since they're sorted, and
> because you have many filters, it's more efficient to read the files
> in parallel.
>
> My first draft of this program used this line to implement the inner loop:
>
>     $current_filter = <FILTERS> while $item gt $current_filter;
>
> ... but then I realized  that the second file could run out of filters
> before the first one runs out of data, so it had to become more
> complex:
>
>     #!/usr/bin/perl
>
>     use strict;
>     use warnings;
>
>     die "huh?" unless @ARGV == 2;
>     my($data_file, $filters) = @ARGV;
>
>     open DATA_FILE, $data_file or die "Can't read '$data_file': $!";
>     open FILTERS, $filters or die "Can't read '$filters': $!";
>
>     my $current_filter = '';
>
>     # outer loop reads a line at a time
>   DATA_LINE:
>     while (my $item = <DATA_FILE>) {
>
>       # inner loop updates the filter, if needed
>       # This inner loop would be just this line:
>       ### $current_filter = <FILTERS> while $item gt $current_filter;
>       # .... except that we have to allow for the filters to run out.
>       while ($item gt $current_filter) {
>         if (defined($current_filter = <FILTERS>)) {
>           # a filter was read from the file: normal case
>         } else {
>           # No more filters; print everything else
>           print $item;
>           print while <DATA_FILE>;
>           last DATA_LINE;
>         }
>       }
>
>       # the inner loop has now updated $current_filter
>       print $item unless $item eq $current_filter;
>     }
>
> Hope this helps!
>
> --Tom Phoenix
> Stonehenge Perl Training


Works great, thanks.

One more thing if I may:

How do I mod the code to function as is with two args (perlscr
list1.txt list2.txt)
or accept stdin as data_file when only one arg is given? (cat
list1.txt | perlscr list2.txt)

Thanks Again


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to