Re: Trying to strip HTML tags

Jenda Krynicky Wed, 16 Oct 2002 04:14:32 -0700

From:                   Graig Warner <[EMAIL PROTECTED]>
> I have a CGI script that tries to remove the tags from a string that
> represents the content of an HTML file.
> 
> There are a number of tags that I would like to keep intact, and I
> represent them in the following array:
> 
> @INCLUDE_TAGS = ( "I", "BR", "SUP", "FONT", "P" );
> 
> However, some of these tags take arguments (FONT and P, for example)
> and I cannot seem to get the script to leave these tags intact.
> 
> Below is the regular expression that I am using to perform the search
> and replace operation:
> 
> $html =~ s/<[^>]*[^(@INCLUDE_TAGS.*)]>/ /gi;
> 
> I don't know if I've missed something obvious, but if anyone can help
> me out that would be wonderful.


It's not that easy to parse HTML properly. Therefore it's best to use 
a tested module. Like HTML::Parser.

You don't have to use it directly though.

See http://Jenda.Krynicky.cz/#HTML::JFilter

        use HTML::JFilter;
        $filter = new HTML::JFilter <<'*END*'
        b i code pre br
        a: href name
        font: color size style
        *END*

        $filteredHTML = $filter->doSTRING($enteredHTML);


If you are using ActivePerl you may install HTML::JFilter from my 
repository by
        ppm install --location=http://Jenda.Krynicky.cz/perl HTML-JFilter

HTH, Jenda
=========== [EMAIL PROTECTED] == http://Jenda.Krynicky.cz 
==========
When it comes to wine, women and song, wizards are allowed 
to get drunk and croon as much as they like.
        -- Terry Pratchett in Sourcery


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Trying to strip HTML tags

Reply via email to