Jeff, Very nice explanation. One thing though, I am not using uniq to remove duplicates. I am using it to get a count of duplicates. In my case, I am creating a threshhold to determine when someone (malicious) is scanning my address ranges.
-------------------------------------------------------------------------- It uses post-increment and the fact that hash keys are unique to do its work: for (@in) { if (! $saw{$_}++ ) { push @out, $_ } } Post-increment returns the value first, and THEN increments it. $x = 3; print $x++; # 3 print $x; # 4 Thus, $saw{$_}++ will return 0 the first time $_ is searched for in the hash. !0 is 1 (true), so the first time $_ is searched for, it is pushed to the @out array AND it gets a value in the hash of 1. The next time it's found $saw{$_}++ returns 1 (and sets $saw{$_} to 2), and !1 is 0 (false) so $_ is not pushed a second time. Craig -- which makes more sense: a) sort a 1000 records, then remove duplicates; or b) remove the duplicates, then sort the remaining records The answer is, you can't be sure. It probably depends GREATLY on your data. Here are solutions for both a) and b). # a -- from perlfaq4 (perldoc -q uniq) my $prev = "NO_SUCH_VALUE"; my @sorted = grep { $_ ne $prev and $prev = $_ } sort @records; and # b -- also from perlfaq4 my %seen; my @sorted = sort grep !$seen{$_}++, @records; On May 23, A. Rivera said: >On May 23, Craig Hammer said: >> I am working on a script to read in a firewall logfile, pull out the IP >> addresses of denied packets, then give me a count per IP address, and >> perform a whois on each address. >> >> This previously ran as a VERY SLOW shell script. In bourne, I used sort >> andthen uniq to get a count per IP address. Is there something similar >> to uniq within perl? (I already have it sorting correctly) > >sub uniq { > my @in=@_; > my (%saw,@out); > undef %saw; > @out = grep(!$saw{$_}++, @in); > return @out; >} -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]