Ricardo SIGNES [RS], on Wednesday, November 2, 2005 at 09:24 (-0500)
wrote the following:

RS>  my %occurances; # we'll put occurances here
 
RS>  while (<>) {                # get each line of argument files (or sdin)
RS>         chomp;                    # eliminate the newline
RS>         my @words = split /\s+/;  # break into words on spaces
RS>         next if @words != 2;      # we only care about two-word lines
RS>         push @{$occurances{"@words"}}, $.; # add current line ($.) to list 
of
RS>                                            # lines where we saw these words
RS>  }

RS>  for my $words (keys %occurances) {      # for each found paid
RS>          my @lines = @{ $occurances{$words} }; # get the occurances
RS>    print
RS>                  "$words - seen ",   # we saw this paid...
RS>                  scalar @lines,      # as many times as there are occurances
RS>                  " times: (lines: ", join(", ", @lines), ")\n"; # and then 
list lines
RS>  }

thanks for your cool snippet, I just finish mine. Also I add one
condition - I interest only to some words in this example it is
"business" (for example in words.txt I have 100.000 lines,
on every line is somewhere "business" - so it will be faster, and eats less 
memory). So
here it is:

my $keyword = "business";
my %all2words = ();
open TXT, "words.txt" or die $!;
while (<TXT>) {
        chomp;
        while ( /(?=(\S+\s+\S+))\S+/g ) {
                my $temp = $1;
                $all2words{$temp}++ if $temp =~ /$keyword/i;
        }
}
close TXT or die $!;

for my $test ( sort { $all2words{$b} <=> $all2words{$a} } keys %all2words ) {
        print "$test => $all2words{$test}\n";
}

OUT is:
small business => 6318
business solutions => 3364
business plan => 2807
business card => 2388

and so on. It is not exact, but for picture it is ok.

-- 

How do you protect mail on web? I use http://www.2pu.net

[Yo mama house so small she has to go outside to eat a large pizza.]



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to