I'm using a regular expression to extract text from an html file. My code: while (<>) { if(/\<FONT SIZE=2 COLOR="#0000FF">(.*?)<\/FONT>/) { print "$1\n"; } }
When I run it on a this big glom of html, I get only the first occurrence, the words LA BELLE EPOQUE. <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> <html> <head> <title> Summary of Latest Inspection </title> </head> <body> <p> <TABLE BORDER CELLPADDING=1><TR><TD><TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0 WIDTH="100%"><TR><TD ALIGN=CENTER BGCOLOR="#FFFFFF"><A NAME="PAGE_1"><B><FONT SIZE=2 COLOR="#0000FF">LA BELLE EPOQUE</FONT></B></A></TD></TR><TR><TD ALIGN=CENTER BGCOLOR="#FFFFFF"><FONT SIZE=2 COLOR="#0000FF">827 BROADWAY, NY 10003</FONT></TD></TR><TR><TD ALIGN=CENTER BGCOLOR="#FFFFFF"><FONT SIZE=2>Inspection Date: <font color="#0000ff">10/02/2003</font></FONT></TD></TR><TR><TD ALIGN=CENTER BGCOLOR="#FFFFFF"><B><FONT SIZE=2 COLOR="#0000FF">Passed inspection. Follow-up inspection not required.</FONT></B><FONT SIZE=2> </FONT></TD></TR><TR><TD ALIGN=LEFT BGCOLOR="#FFFFFF"><FONT SIZE=2>No violations were recorded at the time of this inspection. </FONT><FONT SIZE=2> </FONT></TD></TR></TABLE></TD></TR> If I pull the tags that I'm searching on and put them in a new file on separate lines, as below... FONT SIZE=2 COLOR="#0000FF">LA BELLE EPOQUE</FONT> <FONT SIZE=2 COLOR="#0000FF">827 BROADWAY, NY 10003</FONT> <FONT SIZE=2 COLOR="#0000FF">Passed inspection. Follow-up inspection not required.</FONT> ...and then run my script on this file, I get all of the occurrences: LA BELLE EPOQUE 827 BROADWAY, NY 10003 Passed inspection. Follow-up inspection not required. Any idea why I'm not getting all of the occurrences when I run my script on the original html file? Thanks. Dan -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>