Dan Armstrong wrote:
I'm using a regular expression to extract text from an html file. My code:

while (<>) {
if(/\<FONT SIZE=2 COLOR="#0000FF">(.*?)<\/FONT>/) {
        print "$1\n";
}
   }

When I run it on a this big glom of html, I get only the first
occurrence, the words LA BELLE EPOQUE.

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> <html> <head> <title> Summary of Latest Inspection </title> </head> <body> <p> <TABLE BORDER CELLPADDING=1><TR><TD><TABLE BORDER=0 CELLSPACING=0
CELLPADDING=0 WIDTH="100%"><TR><TD ALIGN=CENTER BGCOLOR="#FFFFFF"><A
NAME="PAGE_1"><B><FONT SIZE=2 COLOR="#0000FF">LA BELLE
EPOQUE</FONT></B></A></TD></TR><TR><TD ALIGN=CENTER
BGCOLOR="#FFFFFF"><FONT SIZE=2 COLOR="#0000FF">827 BROADWAY, NY
10003</FONT></TD></TR><TR><TD ALIGN=CENTER BGCOLOR="#FFFFFF"><FONT
SIZE=2>Inspection Date:&nbsp;&nbsp;<font
color="#0000ff">10/02/2003</font></FONT></TD></TR><TR><TD ALIGN=CENTER
BGCOLOR="#FFFFFF"><B><FONT SIZE=2 COLOR="#0000FF">Passed inspection. Follow-up inspection not required.</FONT></B><FONT SIZE=2>
</FONT></TD></TR><TR><TD ALIGN=LEFT BGCOLOR="#FFFFFF"><FONT SIZE=2>No
violations were recorded at the time of this inspection. </FONT><FONT
SIZE=2> </FONT></TD></TR></TABLE></TD></TR>


If I pull the tags that I'm searching on and put them in a new file on
separate lines, as below...

FONT SIZE=2 COLOR="#0000FF">LA BELLE EPOQUE</FONT>
<FONT SIZE=2 COLOR="#0000FF">827 BROADWAY, NY 10003</FONT>
<FONT SIZE=2 COLOR="#0000FF">Passed inspection.  Follow-up inspection
not required.</FONT>

...and then run my script on this file, I get all of the occurrences:

LA BELLE EPOQUE
827 BROADWAY, NY 10003
Passed inspection.  Follow-up inspection not required.

Any idea why I'm not getting all of the occurrences when I run my
script on the original html file?

Because you are only asking the match operator to find the first occurrence of the pattern. You need to tell it to find all the occurrences:


while ( <> ) {
    print for /<FONT SIZE=2 COLOR="#0000FF">(.*?)<\/FONT>/g;
    }


John -- use Perl; program fulfillment

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>




Reply via email to