Wow -- that is really cool. I am going to go review hashes. How crazy
compact!

thanks a lot,

Tim

-----Original Message-----
From: Peter Scott [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, January 22, 2002 9:40 AM
To: Booher Timothy B 1stLt AFRL/MNAC; '[EMAIL PROTECTED]'
Subject: Re: Count Words


At 08:59 AM 1/22/02 -0600, Booher Timothy B 1stLt AFRL/MNAC wrote:
>I am trying to write a perl script to count the words (not counting
>duplicates) in a file based on the following definition of word:
>
>"A word is any collection of characters seperated by white space or
>punctuation characters such as {.!?,}"
>
>I have a lot of ideas, but also the suspicion that someone else has done
>this before. Here is my basic approach.
>
>--> create two-dimensional array with following axes {x = word.length, y =
>word.string}
>--> read line
>         --> read first word
>         --> compare word against entire column of similiar sized words
>                 if found then promote word one higher in column
>                 else add word to the bottom of the column and increment
word
>count
>
>Any ideas on a more efficient approach -- anything else out there that does
>this?

Whoa, sounds like someone hasn't met hashes yet.

Hashes are the first coolest thing you encounter when learning Perl (unless 
you've come from awk, which I don't think you have).

If we accept the set of word characters as being defined by \w, your 
problem can be solved with this code:

         my %word;
         while (<>) {
           $word{$_}++ for /(\w+)/g;
         }

Somewhat simpler than you were imagining?  Here's how it works:

         my %word;

Declare hash (since the code is going to run with "use strict").

         while (<>) {

While we can read a line from either files named on the command line or 
standard input, put the line into the variable $_

           for /(\w+)/g;

Loop over all groups of consecutive word characters in $_, putting each one 
into a temporary $_

         $word{$_}++

Increment the count stored in the hash corresponding to that word.  If 
there isn't one there yet, create one with an initial value of 0, then add 
1 to it.

After the end of the loop you can dump the concordance with something like:

         print "$_: $word{$_}\n" for sort keys %word;



--
Peter Scott
Pacific Systems Design Technologies
http://www.perldebugged.com

Reply via email to