--- [EMAIL PROTECTED] wrote:
> Thank you very much for all the great help I received earlier on
> extracting numbers from text. There is only one thing I forgot
about:
> Some of the files have HTML headers and footers. I don't want any
> data inside HTML brackets. I tried:
> s/<*>//g;
> I don't understand why this doesn't work. I actually just need
> s/<*\d*>//g because the other expressions are automatically taken
> care of by the expressions that delete all text. Why doesn't the
> above work for removing html tags? I have several perl books and
> none say the character "<" is reserved. "\<" doesn't work either
> Thanks for any help. (Actually, thanks for writing my program for
> me; although I'm trying hard to do it myself.) ;-)
First,
> s/<*>//g;
this means remove all >'s preceded by any number of >'s.
try this:
s/<[^>]*>//g;
which gets <, any number of NOT >'s, and a >.
You *could* have said
s/<.*>//g;
The "." being a regex wildcard, but for "<br> a 123 b <br>" it would
grab the whole line. Instead, you could say s/<.*?>//g which tries to
match a minimal length string (because of the question mark).
__________________________________________________
Do You Yahoo!?
Yahoo! Auctions - buy the things you want at great prices
http://auctions.yahoo.com/