RE: Removing duplicate lines.

jonathan . musto Mon, 21 Jul 2003 07:10:54 -0700

Thanks all, that worked a treat.

Jonathan Musto


 

BT Global Services
Telephone - 0113 237 3277
Fax - 0113 244 1413
E-mail - [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> 
http://www.technet.bt.com/sit/public


British Telecommunications plc 
Registered office: 81 Newgate Street London EC1A 7AJ 
Registered in England no. 1800000 
This electronic message contains information from British Telecommunications
plc which may be privileged or  confidential. The information is intended to
be for the use of the individual(s) or entity named above. If you  are not
the intended recipient be aware that any disclosure, copying, distribution
or use of the contents of  this information is prohibited. If you have
received this electronic message in error, please notify us by  telephone or
email (to the numbers or address above) immediately.






-----Original Message-----
From: Jenda Krynicky [mailto:[EMAIL PROTECTED]
Sent: Monday, July 21, 2003 13:33
To: [EMAIL PROTECTED]
Subject: Re: Removing duplicate lines.


From: [EMAIL PROTECTED]
> I have a text file which contains a list of companies:
> 
> NORTH DOWN AND ARDS INSTITUTE
> NOTTINGHAM HEALTH AUTHORITY
> 1ST CONTACT GROUP LTD
> 1ST CONTACT GROUP LTD
> 1ST CONTACT GROUP LTD
> 1ST CONTACT GROUP LTD
> 4D TELECOM & KINGSTON INMEDIA
> A E COOK LTD
> A E COOK LTD
> 
> etc......
> 
> How can a write a simple perl script to remove the duplicates and
> leave just one of each customer? Any help would be great.

Is it safe to assume that all duplicates are together like this? If 
so all you have to do is to
        1) read the file line by line
        2) only print the line you just read if it's different from the last

one
        3) remember the line

or if I word it differently. 
        1) read the file
        2) skip the line if it's the same as the last one
        3) print it and remember it

        my $last = '';
        while (<>) {
                next if $_ eq $last;
                print $_;
                $last = $;
        }

If the duplicates are scattered all over the place, then the easiest 
solution is to use a hash, That will get rid of the duplicates for 
you:

        while (<>) {
                chomp;
                $seen{$_}++;
        }

        foreack my $item (keys %seen) {
                print $item,"\n";
        }

If the list of companies is huge you may need to store the hash on 
disk to prevent swapping:

        use DB_File;
        tie %seen, 'DB_File', $filename;
        ...

Jenda
===== [EMAIL PROTECTED] === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed 
to get drunk and croon as much as they like.
        -- Terry Pratchett in Sourcery


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Removing duplicate lines.

Reply via email to