http://www.effectiveperlprogramming.com/blog/314
Brian. On Thu, Apr 21, 2011 at 2:42 PM, Marc Perry <marcperrys...@gmail.com> wrote: > Hi, > > I was parsing a collection of HTML files where I wanted to extract a > certain > block from each file, like this: > > > ./script.pl *.html > > my $accumulator; > my $capture_counter; > > while ( <> ) { > if ( /<h1>/.../labelsub/ ) { > $accumulator .= $_ unless /labelsub/; > if ( /labelsub/ && !$capture_counter ) { > print $accumulator; > $capture_counter = 1; > } > else { > next; > } > } > else { > next; > } > } > continue { # flush out the variables and clean up > if ( eof ) { > close ARGV; > $accumulator = ''; > $capture_counter = ''; > } > } > > The bit about the $capture_counter is because some of the files have > multiple blocks of text that could be accumulated, and I only want the > first > block in the file. > > This usually works fine, until I encountered an input file that did not > contain the string 'labelsub' after the first '<h1>' regex pattern match. > Then the conditional if test continued to search in the incoming lines in > the next file (because I am processing a whole batch using the while (<>) > operator), which it eventually found, and then printed nothing, because at > the end-of-file of the previous file, the script flushed the contents of > the > accumulator. > > One solution is to just run the same script individually on each file, but > I > was wondering if there was a way to reset the 'state' of the range operator > pattern match at the end of the physical file (or at any other time for > that > matter)? > > Thanks, > > --Marc >