There's already a lot of good advice here, but just one more thing...
Some people write HTML code like this
Using:
s/<.*?>//g
Doesn't account for that and it won't match.
To allow '.' to match line breaks in tags, use:
s/<.*?>//gs
- Johnathan
On Tue, 24 Apr 2001 [EMAIL PROTECTED] wrote:
> Thank you very much for all the great help I received earlier on extracting
> numbers from text. There is only one thing I forgot about:
>
> Some of the files have HTML headers and footers. I don't want any data
> inside HTML brackets. I tried:
: Try s/<.*>//g - the . means "any character" and will eliminate a
: less-than, then 0 or more characters, then a greater than.
Careful: if there's more than one greater-than in the line, this regex
will wipe out everything between (and including) the first "<" and the
last ">" on the line, beca
--- [EMAIL PROTECTED] wrote:
> Thank you very much for all the great help I received earlier on
> extracting numbers from text. There is only one thing I forgot
about:
> Some of the files have HTML headers and footers. I don't want any
> data inside HTML brackets. I tried:
> s/<*>//g;
> I
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On Tue, 24 Apr 2001 [EMAIL PROTECTED] wrote:
> Thank you very much for all the great help I received earlier on extracting
> numbers from text. There is only one thing I forgot about:
>
> Some of the files have HTML headers and footers. I don't wan
: Some of the files have HTML headers and footers. I don't want any data
: inside HTML brackets. I tried:
:
: s/<*>//g;
:
: I don't understand why this doesn't work.
Because (a) "<*" in a regex means "zero or more less-thans", and (b)
Perl regex matching is greedy- it matches the longest
Thank you very much for all the great help I received earlier on extracting
numbers from text. There is only one thing I forgot about:
Some of the files have HTML headers and footers. I don't want any data
inside HTML brackets. I tried:
s/<*>//g;
I don't understand why this doesn't work.