http://www.effectiveperlprogramming.com/blog/314

Brian.

On Thu, Apr 21, 2011 at 2:42 PM, Marc Perry <marcperrys...@gmail.com> wrote:

> Hi,
>
> I was parsing a collection of HTML files where I wanted to extract a
> certain
> block from each file, like this:
>
> > ./script.pl *.html
>
> my $accumulator;
> my $capture_counter;
>
> while ( <> ) {
>    if ( /<h1>/.../labelsub/ ) {
>        $accumulator .= $_ unless /labelsub/;
>        if ( /labelsub/ && !$capture_counter ) {
>            print $accumulator;
>            $capture_counter = 1;
>        }
>        else {
>            next;
>        }
>    }
>    else {
>        next;
>    }
> }
> continue { # flush out the variables and clean up
>   if ( eof ) {
>        close ARGV;
>        $accumulator = '';
>        $capture_counter = '';
>    }
> }
>
> The bit about the $capture_counter is because some of the files have
> multiple blocks of text that could be accumulated, and I only want the
> first
> block in the file.
>
> This usually works fine, until I encountered an input file that did not
> contain the string 'labelsub' after the first '<h1>' regex pattern match.
> Then the conditional if test continued to search in the incoming lines in
> the next file (because I am processing a whole batch using the while (<>)
> operator), which it eventually found, and then printed nothing, because at
> the end-of-file of the previous file, the script flushed the contents of
> the
> accumulator.
>
> One solution is to just run the same script individually on each file, but
> I
> was wondering if there was a way to reset the 'state' of the range operator
> pattern match at the end of the physical file (or at any other time for
> that
> matter)?
>
> Thanks,
>
> --Marc
>

Reply via email to