On Thu, 2006-04-13 at 15:05 +0100, Max von Seibold wrote: > I'm trying to write a small word counting script - I'm certain there are > zillions out there but it seemed a good learning exercise... > > Bascially I read in each line from my text file which I want to count > with a basic while loop and file handle. > > The problem is on the count. I know I could split the variable holding > the entire line on each whitespace and load all the components into an > array which I could subsequently count. However this seems a bit crude > and I was wondering if there is a better way to do with regular > expressions which processes each line as a batch. > > > I know how to test for the occurrence of a 'word' with > > $lineFromFile =~ m/\s\w\s/i > > > However this only tells me if there are individual words in each line. > Is there some way I can count there occurrences? Something involving $1 ?
If you just want a total count of all words in the file, then Shawn had a good example. But I interpreted your question to mean that you would like to know how many time each word appears in a file. For example: 'This word is the word after the word at the beginning.' ...would match 'This' once, 'word' three times, etc. If that is what you're looking for, then try this. # Always use strict! use strict; # initialize word count hash my %wc; # Loop over the lines in the input file while(<>) { # while the matching regex is returning true, increment the word # count hash at key <word>. That is, everytime the regex matches in # the line, increment the hash key corresponding to the word that # was matched. $wc{$1}++ while m/\b(\w+)\b/g; } # print the results by 'map'ing the sorted hash keys to a line in the # format of: # [this] => [number] print map { "[$_] => [$wc{$_}]\n" } sort keys %wc; # print the results again, this time sorted by number of times the word # was encountered in the file by overriding the sort function's # comparison. print map { "[$_] => [$wc{$_}]\n" } sort \ { $wc{$a} <=> $wc{$b} } keys %wc; __END__ HTH -- Joshua Colson <[EMAIL PROTECTED]> -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>