On Thu, 2006-04-13 at 15:05 +0100, Max von Seibold wrote:
> I'm trying to write a small word counting script - I'm certain there are 
> zillions out there but it seemed a good learning exercise...
> 
> Bascially I read in each line from my text file which I want to count 
> with a basic while loop and file handle.
> 
> The problem is on the count. I know I could split the variable holding 
> the entire line on each whitespace and load all the components into an 
> array which I could subsequently count. However this seems a bit crude 
> and I was wondering if there is a better way to do with regular 
> expressions which processes each line as a batch.
> 
> 
> I know how to test for the occurrence of a 'word' with   
> 
> $lineFromFile =~ m/\s\w\s/i
> 
> 
> However this only tells me if there are individual words in each line. 
> Is there some way I can count there occurrences? Something involving $1  ?

If you just want a total count of all words in the file, then Shawn had
a good example. But I interpreted your question to mean that you would
like to know how many time each word appears in a file. For example:

'This word is the word after the word at the beginning.'

...would match 'This' once, 'word' three times, etc. If that is what
you're looking for, then try this.

# Always use strict!
use strict;
# initialize word count hash
my %wc;

# Loop over the lines in the input file
while(<>) {
  # while the matching regex is returning true, increment the word
  # count hash at key <word>. That is, everytime the regex matches in
  # the line, increment the hash key corresponding to the word that
  # was matched.
  $wc{$1}++ while m/\b(\w+)\b/g;
}

# print the results by 'map'ing the sorted hash keys to a line in the
# format of:
# [this] => [number]
print map { "[$_] => [$wc{$_}]\n" } sort keys %wc;

# print the results again, this time sorted by number of times the word
# was encountered in the file by overriding the sort function's
# comparison.
print map { "[$_] => [$wc{$_}]\n" } sort \
 { $wc{$a} <=> $wc{$b} } keys %wc;

__END__

HTH

-- 
Joshua Colson <[EMAIL PROTECTED]>


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to