Ricardo SIGNES [RS], on Wednesday, November 2, 2005 at 09:24 (-0500)
wrote the following:
RS> my %occurances; # we'll put occurances here
RS> while (<>) { # get each line of argument files (or sdin)
RS> chomp; # eliminate the newline
RS> my @words = split /\s+/; # break into words on spaces
RS> next if @words != 2; # we only care about two-word lines
RS> push @{$occurances{"@words"}}, $.; # add current line ($.) to list
of
RS> # lines where we saw these words
RS> }
RS> for my $words (keys %occurances) { # for each found paid
RS> my @lines = @{ $occurances{$words} }; # get the occurances
RS> print
RS> "$words - seen ", # we saw this paid...
RS> scalar @lines, # as many times as there are occurances
RS> " times: (lines: ", join(", ", @lines), ")\n"; # and then
list lines
RS> }
thanks for your cool snippet, I just finish mine. Also I add one
condition - I interest only to some words in this example it is
"business" (for example in words.txt I have 100.000 lines,
on every line is somewhere "business" - so it will be faster, and eats less
memory). So
here it is:
my $keyword = "business";
my %all2words = ();
open TXT, "words.txt" or die $!;
while (<TXT>) {
chomp;
while ( /(?=(\S+\s+\S+))\S+/g ) {
my $temp = $1;
$all2words{$temp}++ if $temp =~ /$keyword/i;
}
}
close TXT or die $!;
for my $test ( sort { $all2words{$b} <=> $all2words{$a} } keys %all2words ) {
print "$test => $all2words{$test}\n";
}
OUT is:
small business => 6318
business solutions => 3364
business plan => 2807
business card => 2388
and so on. It is not exact, but for picture it is ok.
--
How do you protect mail on web? I use http://www.2pu.net
[Yo mama house so small she has to go outside to eat a large pizza.]
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>