Re: Removing duplicate IDs

John W. Krahn Mon, 08 May 2006 20:26:18 -0700

FlashMX wrote:
> Hi,

Hello,


> I have a script that sorts the IDs with numerials first followed by
> alphas. Everything works ok except that there are some IDs in the
> input file that are duplicates. How can I "omit" and duplicate IDs 
> from the output file?
> 
> Below is my script and a sample input and output file generated
> from the below. Notice in the output file I have duplicate IDs for
> the following:
> 
> 2, 5, 15, ab, bcm fa
> 
> I only want the output file to contain the first occurance and toss
> the rest out. How can I do this?

Use a hash.  :-)


> #!/usr/local/bin/perl
> 
> require 5.000;
> 
> my %tags = ();
> 
> my $input = $ARGV[0];
> my $output = $ARGV[1];
> 
> open (FILE, "< $input") or die "cannot open $input: $!\n";
>   open (OUTPUTFILE, "> $output");

You should verify that OUTPUTFILE is valid like you do for FILE.

>     chomp(my @lines = <FILE>);

Why chomp the input when you are just adding the newlines on output?

>     my @chars = map {
>        my ($id) = m{<a id=(\w+)>};
>        [ $_, $id, scalar $id =~ /^\d+$/ ];
>     } @lines;

my %seen;

    my @chars = grep !$seen{$_->[1]}++,
                map {
       my ($id) = m{<a id=(\w+)>};
       [ $_, $id, scalar $id =~ /^\d+$/ ];
    } @lines;

>     my @sorted_chars = sort {
>        $b->[2] <=> $a->[2]
>        or
>        ($a->[2] ? $a->[1] <=> $b->[1] : $a->[1] cmp $b->[1])
>        or
>        $a->[0] cmp $b->[0]
>     } @chars;
>     my @result = map { $_->[0] } @sorted_chars;
>     print OUTPUTFILE "$_\n" for @result;
>   close OUTPUTFILE;
> close FILE;

You don't really need four different arrays:

my %seen;
print OUTPUTFILE
    map $_->[0],
    sort {
        $b->[2] <=> $a->[2]
        or
        ( $a->[2] ? $a->[1] <=> $b->[1] : $a->[1] cmp $b->[1] )
        or
        $a->[0] cmp $b->[0]
        }
    grep !$seen{$_->[1]}++,
    map {
        my ( $id ) = /<a id=(\w+)>/;
        [ $_, $id, scalar $id =~ /^\d+$/ ];
        }
    <FILE>;




John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Removing duplicate IDs

Reply via email to