Re: More Help: Complex Regex

Paul Tue, 24 Apr 2001 13:21:52 -0700

--- [EMAIL PROTECTED] wrote:
> Thank you very much for all the great help I received earlier on
> extracting numbers from text.  There is only one thing I forgot
about:
> Some of the files have HTML headers and footers.  I don't want any
> data inside HTML brackets.   I tried:
>   s/<*>//g; 
> I don't understand why this doesn't work. I actually just need
> s/<*\d*>//g because the other expressions are automatically taken
> care of by the expressions that delete all text.  Why doesn't the
> above work for removing html tags?  I have several perl books and
> none say the character "<" is reserved. "\<" doesn't work either
> Thanks for any help.  (Actually, thanks for writing my program for
> me; although I'm trying hard to do it myself.)   ;-)

First,
>   s/<*>//g; 

this means remove all >'s preceded by any number of >'s.
try this:
 s/<[^>]*>//g;

which gets <, any number of NOT >'s, and a >.

You *could* have said 
   s/<.*>//g; 

The "." being a regex wildcard, but for "<br> a 123 b <br>" it would
grab the whole line. Instead, you could say  s/<.*?>//g which tries to
match a minimal length string (because of the question mark).



__________________________________________________
Do You Yahoo!?
Yahoo! Auctions - buy the things you want at great prices
http://auctions.yahoo.com/
Re: More Help: Complex Regex

Reply via email to