I have used this little routine to strip HTML. Might be ineffecient, I don't know..
Assuming HTML has been loaded into variable $html $html=~ s/\n//g; $html=~ s/>/>\n/g; @html=split(/\n/, $html); foreach $_(@html) { $_=~ s/<.*>//g; $newhtml.=$_; } print $newhtml; Agustin Rivera ----- Original Message ----- From: "Etienne Marcotte" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, November 13, 2001 11:44 AM Subject: YARQ (Yet Another Regexp Question) > I saw somewhere on the web a good regexp for removing html tags. Can't > re-find it and it needed some minor mods. > > Let's say the $line is 'this is a <font size="2">large word</font>in > size 2'; > > I played a little around, but it always removed between the first < and > the last > (and I knwo the tutorial on the web said how to avoid this) > > I'd like to make something like this (I know this one's not good, but > please help place parenthesis and [] and {} :) > > .* < (.*) \s .* > .* </ \1 > .* > this is a < font size="2" > large word </ font > in size 2 > > the above line show what is the match for each part... > > thanks for help... > > And also is tthe a way to specify a list of allowed tags? or a list of > unallowed tags. > like if the (.*) is foo or bar to delete, keep is something else... > > I don't think it's clear, but I'll try to help if you need more details > on what I'm trying to accomplish > > Etienne > > -- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]