: Some of the files have HTML headers and footers. I don't want any data
: inside HTML brackets. I tried:
:
: s/<*>//g;
:
: I don't understand why this doesn't work.
Because (a) "<*" in a regex means "zero or more less-thans", and (b)
Perl regex matching is greedy- it matches the longest string that can
make the match true. What you want to do is something like this:
s/<.*?>//g;
In this one, ".*" means "zero or more of anything, which under normal
circumstances would mean > as well, except that ".*?" means "match the
shortest string that makes the regex true". So "<.*?>" will match the
shortest string between < and >.
Alternatively, you could do this:
s/<[^>]*>//g;
which says "match a <, followed by zero or more characters that aren't
>, and then a >". I think the first looks clearer, but the second sounds
more obvious.
: Thanks for any help. (Actually, thanks for writing my program for me;
: although I'm trying hard to do it myself.) ;-)
oh... in that case, ignore everything we've said. ;)
--
Tim Kimball · ACDSD / MAST ¦
Space Telescope Science Institute ¦ We are here on Earth to do good to others.
3700 San Martin Drive ¦ What the others are here for, I don't know.
Baltimore MD 21218 USA ¦ -- W.H. Auden