RE: uniq

Craig Hammer Thu, 23 May 2002 08:16:16 -0700

Jeff,

Very nice explanation.  One thing though, I am not using uniq to remove
duplicates.  I am using it to get a count of duplicates.  In my case, I am
creating a threshhold to determine when someone (malicious) is scanning my
address ranges.



--------------------------------------------------------------------------
It uses post-increment and the fact that hash keys are unique to do its
work:

  for (@in) {
    if (! $saw{$_}++ ) { push @out, $_ }
 }

Post-increment returns the value first, and THEN increments it.

  $x = 3;
  print $x++;  # 3
  print $x;    # 4

Thus, $saw{$_}++ will return 0 the first time $_ is searched for in the
hash.  !0 is 1 (true), so the first time $_ is searched for, it is pushed
to the @out array AND it gets a value in the hash of 1.  The next time
it's found $saw{$_}++ returns 1 (and sets $saw{$_} to 2), and !1 is 0
(false) so $_ is not pushed a second time.

Craig -- which makes more sense:

  a) sort a 1000 records, then remove duplicates; or
  b) remove the duplicates, then sort the remaining records

The answer is, you can't be sure.  It probably depends GREATLY on your
data.  Here are solutions for both a) and b).

  # a -- from perlfaq4 (perldoc -q uniq)
  my $prev = "NO_SUCH_VALUE";
  my @sorted = grep { $_ ne $prev and $prev = $_ } sort @records;

and

  # b -- also from perlfaq4
  my %seen;
  my @sorted = sort grep !$seen{$_}++, @records;


On May 23, A. Rivera said:

>On May 23, Craig Hammer said:
>> I am working on a script to read in a firewall logfile, pull out the IP
>> addresses of denied packets, then give me a count per IP address, and
>> perform a whois on each address.
>>
>> This previously ran as a VERY SLOW shell script.  In bourne, I used sort
>> andthen uniq to get a count per IP address.  Is there something similar
>> to uniq within perl?  (I already have it sorting correctly)
>
>sub uniq {
>  my @in=@_;
>  my (%saw,@out);
>  undef %saw;
>  @out = grep(!$saw{$_}++, @in);
>  return @out;
>}

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: uniq

Reply via email to