Re: How to match next line?

Tagore Smith Wed, 22 May 2002 10:55:37 -0700

Scot Robnett wrote:


> I am concatenating several e-mail lists into one, and some of the
addresses
> are on multiple lists. I need to be able to print out a full,
alphabetically
> sorted list of all the names without printing duplicates. How would I
alter
> the following to make that happen?
>
> This gets close; it *will* remove a duplicate address if there are only
two
> of the same address. But it won't work if there are three or four of the
> same address, for example.
>
> I'm also sure this could be shortened or cleaned up; suggestions
welcome....

perldoc -q duplicate will get you to the faq entry for this. Some of the
solutions are a little obscure, relying on grep, slicing, etc..

A simple way of doing it is just to put the values in a hash, as hash keys
must be unique:

my %resulthash=();

foreach my $address(@slurped){
    $resulthash{$address}++;
}

my @result=sort(keys %resulthash);

This could be made more compact (and in fact is essentially what some of the
faq answers are doing, just more verbose), but it's pretty clear this way.
It also stores the number of times each address occurred as the values of
the hash. It does wind up using a lot of extra space if your list is large.
You could get around this by just reading the addresses straight into a
hash.

If you want to do it in a way sumilar to what you're doing below you could
try:

my $lastseen='';
foreach my $address(@sorted){
    print "$address\n" unless ($address eq $lastseen); #or push onto a new
array, or write to a file
    $lastseen=$address;
}

A couple of things about your code:

Usually in Perl you don't need to use the c style for loop unless you
actually need to know what index you're looking at. Also you don't need to
use a regular expression to compare addresses, eq is probably what you want.
In fact, using a regular expression without anchors will consider two email
addresses like [EMAIL PROTECTED] and [EMAIL PROTECTED] to be
duplicates. (or [EMAIL PROTECTED] and [EMAIL PROTECTED], depending on
whether you're looking ahead or back in the sorted list). Also, you should
probably check to make sure that your call to open succeeded.

Anyway, the answers in the faq are more Perlish (and probably better :) ),
the solutions I wrote are more verbose, but maybe more immediately
comprehensible if you're not used to grep or slicing.

> #!/usr/bin/perl -w
>
> use strict;
> my $infile = '/path/to/.biglist.list';
> my @slurped = ();
> open(IN,"<$infile");
> while(<IN>) {
>  @slurped = <IN>; # Pull addresses into an array
> }
> close(IN);
>
> my @sorted = sort(@slurped); # Create a sorted list
> my $i;
> my $j;
> foreach ($i = 0; $i <= @sorted; $i++) {
>
>  chomp($i);
>  $j = $i++; # Only finds 1 address beyond $i
>
>  if ($sorted[$i] =~ /$j/) # Compare this line w/next line
>  {
>   next; # If it matches, skip it
>  }
>
>  else {
>   print "Address = $sorted[$i] \n";
>  }
>
> }

Tagore Smith



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: How to match next line?

Reply via email to