Andrej Kastrin wrote:
> John W. Krahn wrote:
> 
>> Andrej Kastrin wrote:
>>> 
>>> I want to count words in the following file:
>>> ------------------------------
>>> ID- some number
>>> TI- some text BB
>>> AB- some text A BB
>>> AU- some text
>>>
>>> ID- some number
>>> TI- some GGG text
>>> AB- some text GGG
>>> AU- some text
>>>
>>> ID- some number
>>> TI- some text
>>> AB- some text Z
>>> AU- some text
>>> ------------------------------
>>>
>>> I wrote the script which parses througt the file and return the total
>>> number of words, defined in @list. Here is the problem. There could be
>>> more then one equal word in each record (see the first record where BB
>>> occurs twice).
>>>
>>> I don't know, how to modify my code; so if there are multiple same words
>>> in a record, frequency of each word per record does not exceed 1.
>>
>>
>> The value of $/ starts out at "\n", in other words, read one "line" at
>> a time.
>>
>>> while (<>){
>>>     $/="\n\n"; #set input separator to read record
>>>     $/="\n"; #set input separator to parse within a record
>>
>> You change $/ and then change it back to the default so in effect you
>> are not
>> really changing $/ at all.
>>
>>>     chomp;
>>
>> chomp() is not really needed.
>>
>>>     if(/^TI.+/){
>>
>> You are only modifying $wds for lines that begin with the string 'TI'
>> so the
>> 'BB' on the line beginning with 'AB' will not be counted.
>>
>>>           foreach $w (split){
>>>                  $wds++ if defined($words{$w})
>>>           }
>>>     }
>>> }
>>>
>>> print "\n$wds words"; #print frequency of words, defined in @list
>>
> OK, I realize, but problem still persist. How to modify my code that if
> there are 2 or more same words in one record (or in one line) only one
> is counted.

If I understand your requirements correctly, you probably want something like
this (UNTESTED):

my @list = qw( A BB GGG Z ); #define term list

$/ = ''; #set input separator to paragraph mode
my $wds;
while ( <> ) {
    for my $w ( @list ) {
        $wds++ if /^TI.*?\s$w\s/m;
        }
    }

print "\n$wds words"; #print frequency of words, defined in @list




John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to