That fixed it! Thank you very much!
Clint

Wiggins d'Anconia wrote:
Clint wrote:

I'm trying to scrape a section of html and don't see why my regexp stopped working this week. The relevant two-line sample section from

http://www.srh.noaa.gov/data/forecasts/SCZ020.php

is:


<td><b>Barometer</b>:</td> <td align="right" nowrap>30.22&quot; (1023.1 mb)</td>


(notice the space before <td align="right" nowrap>)


My Perl segment (that was working until this week) is:


sub barometer {
local $_ = shift;
m{<td><b>Barometer</b>:</td>\n\s<td align="right" nowrap>(.*?)&quot;} || die "No barometer data";
return $1;
}


(the match line is one long line, but it's wrapping here in email.)

now this match *will* find the pressure if I modify it to:
m{<td align="right" nowrap>(.*?)&quot;} || die "No barometer data";

but there are other weather quantities that I also want to grab with a similar html structure, but those don't have the nowrap option, so I need to be able to match over the two lines of html. Prior to now, the \n\s was functioning as expected to allow the search over a linefeed and one single space character -- and I can't see that the structure has changed any to break the search pattern.

Is there something obvious in the html structure that I've missed here? I appreciate any advice you might have.


I don't know if it is your mail client (or mine) but there appears to be two spaces one on either side of the new line. You should try switching your \n\s to \s+\n\s+ or \s*\n\s* since those are both valid and should make the match more generic.


In general using regexps to match HTML is a bad idea...

http://danconia.org





-- Clint <[EMAIL PROTECTED]>


-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to