Re: remove duplicate lines

Mathew Wed, 10 Jan 2007 05:25:49 -0800

beast wrote:
> Mumia W. wrote:
>> On 01/10/2007 01:35 AM, beast wrote:
>>> [...]
>>> It will only remove duplicate key.
>>>
>>> Is this still acceptable in perl (its very ugly =(
>>> [...]
>>
>> This would remove duplicate lines:
>>
>>     use List::MoreUtils qw(uniq);
>>     use File::Slurp;
>>     my @list = uniq read_file 'rm_dup_lines.txt';
>>     print $_ for @list;
> Without using any external modules not possible? :)
> 
>> I'm a little confused on what you're trying to do, because you split
>> on commas even though your data has no commas in it. Oh well, that
>> code above removes duplicate lines.
>>
> 
> Data is dummy, sorry for the confusion.
> The actual data was coming from log file, which is separated by space.
> 
> while (<>) {
>   my ($username, $ipaddr, @rest) = split /\s+/;
>   #get the unique combinations of $username and $ipaddr
> }
> 
> And btw, as data was coming from log, lines can be several thousands,
> but _unique lines_ should be less than 100
> 
> --beast
> 
> 
> 
>


I still think a hash would suffice.  If you use the IP address variable
as the key (because it SHOULD be unique, you can build your hash even
with undef values.  Once you place the IP into the hash the first time,
write the line to a new file.  The IP from each subsequent line can then
be looked for in the hash, if it already exists, immediately call the
next line.  If it doesn't exist within the hash, add it and
write the line out.


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: remove duplicate lines

Reply via email to