Beautiful; I should've known that brian d foy would have come up with a
solution--I even have a copy of that book!

Thanks,

--Marc

On Thu, Apr 21, 2011 at 3:10 PM, Brian Fraser <frase...@gmail.com> wrote:

> http://www.effectiveperlprogramming.com/blog/314
>
> Brian.
>
> On Thu, Apr 21, 2011 at 2:42 PM, Marc Perry <marcperrys...@gmail.com>wrote:
>
>> Hi,
>>
>> I was parsing a collection of HTML files where I wanted to extract a
>> certain
>> block from each file, like this:
>>
>> > ./script.pl *.html
>>
>> my $accumulator;
>> my $capture_counter;
>>
>> while ( <> ) {
>>    if ( /<h1>/.../labelsub/ ) {
>>        $accumulator .= $_ unless /labelsub/;
>>        if ( /labelsub/ && !$capture_counter ) {
>>            print $accumulator;
>>            $capture_counter = 1;
>>        }
>>        else {
>>            next;
>>        }
>>    }
>>    else {
>>        next;
>>    }
>> }
>> continue { # flush out the variables and clean up
>>   if ( eof ) {
>>        close ARGV;
>>        $accumulator = '';
>>        $capture_counter = '';
>>    }
>> }
>>
>> The bit about the $capture_counter is because some of the files have
>> multiple blocks of text that could be accumulated, and I only want the
>> first
>> block in the file.
>>
>> This usually works fine, until I encountered an input file that did not
>> contain the string 'labelsub' after the first '<h1>' regex pattern match.
>> Then the conditional if test continued to search in the incoming lines in
>> the next file (because I am processing a whole batch using the while (<>)
>> operator), which it eventually found, and then printed nothing, because at
>> the end-of-file of the previous file, the script flushed the contents of
>> the
>> accumulator.
>>
>> One solution is to just run the same script individually on each file, but
>> I
>> was wondering if there was a way to reset the 'state' of the range
>> operator
>> pattern match at the end of the physical file (or at any other time for
>> that
>> matter)?
>>
>> Thanks,
>>
>> --Marc
>>
>
>

Reply via email to