From: Graig Warner <[EMAIL PROTECTED]> > I have a CGI script that tries to remove the tags from a string that > represents the content of an HTML file. > > There are a number of tags that I would like to keep intact, and I > represent them in the following array: > > @INCLUDE_TAGS = ( "I", "BR", "SUP", "FONT", "P" ); > > However, some of these tags take arguments (FONT and P, for example) > and I cannot seem to get the script to leave these tags intact. > > Below is the regular expression that I am using to perform the search > and replace operation: > > $html =~ s/<[^>]*[^(@INCLUDE_TAGS.*)]>/ /gi; > > I don't know if I've missed something obvious, but if anyone can help > me out that would be wonderful.
It's not that easy to parse HTML properly. Therefore it's best to use a tested module. Like HTML::Parser. You don't have to use it directly though. See http://Jenda.Krynicky.cz/#HTML::JFilter use HTML::JFilter; $filter = new HTML::JFilter <<'*END*' b i code pre br a: href name font: color size style *END* $filteredHTML = $filter->doSTRING($enteredHTML); If you are using ActivePerl you may install HTML::JFilter from my repository by ppm install --location=http://Jenda.Krynicky.cz/perl HTML-JFilter HTH, Jenda =========== [EMAIL PROTECTED] == http://Jenda.Krynicky.cz ========== When it comes to wine, women and song, wizards are allowed to get drunk and croon as much as they like. -- Terry Pratchett in Sourcery -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]