Wow -- that is really cool. I am going to go review hashes. How crazy compact!
thanks a lot, Tim -----Original Message----- From: Peter Scott [mailto:[EMAIL PROTECTED]] Sent: Tuesday, January 22, 2002 9:40 AM To: Booher Timothy B 1stLt AFRL/MNAC; '[EMAIL PROTECTED]' Subject: Re: Count Words At 08:59 AM 1/22/02 -0600, Booher Timothy B 1stLt AFRL/MNAC wrote: >I am trying to write a perl script to count the words (not counting >duplicates) in a file based on the following definition of word: > >"A word is any collection of characters seperated by white space or >punctuation characters such as {.!?,}" > >I have a lot of ideas, but also the suspicion that someone else has done >this before. Here is my basic approach. > >--> create two-dimensional array with following axes {x = word.length, y = >word.string} >--> read line > --> read first word > --> compare word against entire column of similiar sized words > if found then promote word one higher in column > else add word to the bottom of the column and increment word >count > >Any ideas on a more efficient approach -- anything else out there that does >this? Whoa, sounds like someone hasn't met hashes yet. Hashes are the first coolest thing you encounter when learning Perl (unless you've come from awk, which I don't think you have). If we accept the set of word characters as being defined by \w, your problem can be solved with this code: my %word; while (<>) { $word{$_}++ for /(\w+)/g; } Somewhat simpler than you were imagining? Here's how it works: my %word; Declare hash (since the code is going to run with "use strict"). while (<>) { While we can read a line from either files named on the command line or standard input, put the line into the variable $_ for /(\w+)/g; Loop over all groups of consecutive word characters in $_, putting each one into a temporary $_ $word{$_}++ Increment the count stored in the hash corresponding to that word. If there isn't one there yet, create one with an initial value of 0, then add 1 to it. After the end of the loop you can dump the concordance with something like: print "$_: $word{$_}\n" for sort keys %word; -- Peter Scott Pacific Systems Design Technologies http://www.perldebugged.com