Re: Regrex - ignoring the boundaries

Rajeev Rumale Wed, 11 Jul 2001 21:16:53 -0700
Thanks.

For the main problem, and also for the unforeseen onces :-)

with regards

Rajeev Rumale

___________________________________________________

     Yesterday is history.  Tomorrow a mystery.
     Today is a gift. That's why it's called Present.
___________________________________________________



----- Original Message -----
From: "Stephen P. Potter" <[EMAIL PROTECTED]>
To: "Rajeev Rumale" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Thursday, July 12, 2001 11:29 AM
Subject: Re: Regrex - ignoring the boundaries


> Lightning flashed, thunder crashed and "Rajeev Rumale"
<[EMAIL PROTECTED]> whi
> spered:
> | I am attaching both perl code and html file(zipped together).  Please
check
> | out and let me
> | know where I am wrong.
>
> In the future, please don't use attachments (especially binary formats
like
> zip) for this kind of thing.  It is so much easier to deal with when it is
> just inlined ASCII.
>
> Here's your original code, minus a few extra blank lines:
> 1: # pattern matching and extraction.
> 2:
> 3: open (FILE, "test.htm");
> 4:
> 5: @lines = <FILE>;
> 6:
> 7: $text = join("",@lines);
> 8:
> 9: $text=~s/\n/%%new_line%%/g;  # the script does not work correctly if
commented.
> 10:print "found" if $text=~m/%%row_start%%(.+)%%row_end%%/;
> 11:
> 12:$tstart ="%%row_start%%";
> 13:$tend = "%%row_end%%";
> 14:
> 15:$text=~/(($tstart)(.+)($tend))/;
> 16:
> 17:print "The complete pattern is :\n $1\n\n";
> 18:print "The string without delimiters is :\n $3\n\n";
>
>
> I'll go ahead and do a complete code review for you, even though you are
> only asking a specific question.
>
> First of all, in line 3 you aren't checking your return code, which you
> should always do.  In lines 5-7, you read the file into an array, then
join
> the array into a scalar.  This is taking at least twice the amount of
> memory you need.  If this happened to be a large file you were working
> with, this could severely impact the performance (possibly killing the
> script if you ran out of memory).
>
> Next you do the same match twice, once with the delimiters inlined and
once
> with them set as variables.  You then save several chunks of information
> that aren't particularly needed.
>
> Have you checked out the "s" modifier for the s/// function?  From perlre:
>
>        s   Treat string as single line.  That is, change "." to
>            match any character whatsoever, even a newline, which
>            normally it would not match.
>
> FInally you print the answers.  Here's my rewrite of your code, taking
> advantage of several of perl's strengths:
>
> 1: # pattern matching and extraction.
> 2:
> 3: open FILE, "test.htm" or die "Can't open test.htm: $!\n";
> 4: $/ = undef;
> 5: $text = <FILE>;
> 6: close FILE;
> 7:
> 8: if ($text =~ /%%row_start%%(.+)%%row_end%%/s) {
> 9: print "found\n\n";
> 10: print "The complete pattern is :\n $&\n\n";
> 11: print "The string without delimiters is :\n $1\n\n";
> 12:}
>
> As you can see, I've saved six lines of code and a large chunk of memory.
> On line 3, we check the status of the open.  Sure, in this case, it didn't
> particularly matter.  But it is always best to check.  Line for sets the
> input record separator to "undef".  Doing this causes line 5 to read the
> entire file into a single string.  Line 6 closes the filehandle.  Perl
> automatically closes all opened filehandles when the script shuts down,
but
> it is "polite" to clean up after yourself.
>
> Line 8 checks to see if $text contains the two delimiters we're looking
> for.  Note the /s at the end to treat the string as a single line,
ignoring
> the newline characters during the match.  Also note that we save the
> formatted information that we want, and nothing else.  This is, again, a
> memory saver.  Line 10 prints $&, a special variable that contains the
most
> recently matched pattern.  Line 11 prints $1, the code we saved in line 8.
>
> -spp
> --
> Stephen P Potter [EMAIL PROTECTED]
> "You can't just magically invoke Larry and expect that to prove your
point.
> Or prove that you have a point." -Simon Cozens
> UNIX and Perl Consulting and Training
http://www.unixlabs.net/~spp/
>
Re: Regrex - ignoring the boundaries

Reply via email to