From: WC -Sx- Jones <[EMAIL PROTECTED]> > Comments about how to perform these > 5 checks as ONE TEST are welcome - > > /\s+\w+\<g.*\>\w+\s+/ REJECT Invalid HTML Spam vG > /\s+\w+\<j.*\>\w+\s+/ REJECT Invalid HTML Spam vJ > /\s+\w+\<k\w{3,}\>\w+\s+/ REJECT Invalid HTML Spam vK > /\s+\w+\<y.*\>\w+\s+/ REJECT Invalid HTML Spam vY > /\s+\w+\<z.*\>\w+\s+/ REJECT Invalid HTML Spam vZ
Are you really sure you do want to do it like this? This will not reject most incorrect tags, and it does not even try to protect you from malformed HTML and from cross-site-scripting. You might want to do something like: sub PolishHTML { my $str = shift; if ($AllowXHTML) { $str =~ s{(.*?)(&\w+;|&#\d+;|<\w[\w\d]*(?:\s+\w[\w\d]*(?:\s*=\s*(?:[^" '><\s]+|(?:'[^']*')+|(?:"[^"]*")+))?)*\s*/?>|</\w[\w\d]*>|<!--.*?-- >|$)} {HTML::Entities::encode($1, '^\r\n\t !\#\$%\"\'-;=?- ~').$2}gem; } else { $str =~ s{(.*?)(&\w+;|&#\d+;|<\w[\w\d]*(?:\s+\w[\w\d]*(?:\s*=\s*(?:[^" '><\s]+|(?:'[^']*')+|(?:"[^"]*")+))?)*\s*>|</\w[\w\d]*>|<!--.*?-- >|$)} {HTML::Entities::encode($1, '^\r\n\t !\#\$%\"\'-;=?- ~').$2}gem; } return $str; } first to "polish" the HTML and escape the stuff that does no look like proper HTML and then use something like HTML::JFilter or HTML::Filter to get rid of nonexistant or dangerous tags and attributes. > I am not interested in a Perl module > as the "pcre" environment I am using > would require huge CPU eating filters. Doesn't look like it but if you did happen to need it under Windows you may find both the polishing and filtering in http://Jenda.Krynicky.cz/#Jenda.Rex COM object. > Maybe something like this working/production code: > > /\s+\w+\<(?=g|j|y|z).*\>\w+\s+/ REJECT Invalid HTML > > But I am not sure how to handle the > K (kbd) which ultimately could be valid... You should also be able to handle things like this: <input type=text name=foo value="100>10 and 20<x"> and <inpt ...> and most probably be able to remove these <img src="..." onMouseOver="window.open('http://...')"> Jenda ===== [EMAIL PROTECTED] === http://Jenda.Krynicky.cz ===== When it comes to wine, women and song, wizards are allowed to get drunk and croon as much as they like. -- Terry Pratchett in Sourcery -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>