I'm using a regular expression to extract text from an html file. My code:

while (<>) {
if(/\<FONT SIZE=2 COLOR="#0000FF">(.*?)<\/FONT>/) {
        print "$1\n";
}
   }

When I run it on a this big glom of html, I get only the first
occurrence, the words LA BELLE EPOQUE.

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">  <html>  <head> 
<title>  Summary of Latest Inspection  </title>  </head>  <body>  <p> 
<TABLE BORDER CELLPADDING=1><TR><TD><TABLE BORDER=0 CELLSPACING=0
CELLPADDING=0 WIDTH="100%"><TR><TD ALIGN=CENTER BGCOLOR="#FFFFFF"><A
NAME="PAGE_1"><B><FONT SIZE=2 COLOR="#0000FF">LA BELLE
EPOQUE</FONT></B></A></TD></TR><TR><TD ALIGN=CENTER
BGCOLOR="#FFFFFF"><FONT SIZE=2 COLOR="#0000FF">827 BROADWAY, NY
10003</FONT></TD></TR><TR><TD ALIGN=CENTER BGCOLOR="#FFFFFF"><FONT
SIZE=2>Inspection Date:&nbsp;&nbsp;<font
color="#0000ff">10/02/2003</font></FONT></TD></TR><TR><TD ALIGN=CENTER
BGCOLOR="#FFFFFF"><B><FONT SIZE=2 COLOR="#0000FF">Passed inspection. 
Follow-up inspection not required.</FONT></B><FONT SIZE=2>
</FONT></TD></TR><TR><TD ALIGN=LEFT BGCOLOR="#FFFFFF"><FONT SIZE=2>No
violations were recorded at the time of this inspection. </FONT><FONT
SIZE=2> </FONT></TD></TR></TABLE></TD></TR>

If I pull the tags that I'm searching on and put them in a new file on
separate lines, as below...

FONT SIZE=2 COLOR="#0000FF">LA BELLE EPOQUE</FONT>
<FONT SIZE=2 COLOR="#0000FF">827 BROADWAY, NY 10003</FONT>
<FONT SIZE=2 COLOR="#0000FF">Passed inspection.  Follow-up inspection
not required.</FONT>

...and then run my script on this file, I get all of the occurrences:

LA BELLE EPOQUE
827 BROADWAY, NY 10003
Passed inspection.  Follow-up inspection not required.

Any idea why I'm not getting all of the occurrences when I run my
script on the original html file?

Thanks.

Dan

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to