It depends on what YOU want to allow in the way of basic HTML... some parts
of our sites, we allow <B><I><A>, other parts we don't allow <A>.

The reason I mention this is that javascript has exist inside an <A>, which
is an issue Lief will have to look at.

<A HREF="javascript:self.close();">bye bye</a> is a pretty evil piece of
code :)

It gets worse:

Javascript can be placed inside MANY objects using event handlers like

onmouseover
onmouseout
onclick
etc

... not just within <SCRIPT> tags.

So you can't even opt to remove any links that begin with "javascript:"
because, this won't strip any of the events above.

All of these will close the window, in one way or another on JS enabled
browsers (perhaps not all of them, but does it matter?):

<A HREF="javascript:self.close();">bye bye</a>
<A HREF="#" onmouseover="javascript:self.close();" >bye bye</a>
<A HREF="#" onmouseout="javascript:self.close();" >bye bye</a>
<A HREF="#" onclick="javascript:self.close();" >bye bye</a>


But it gets worse :)

<DIV onmouseover="javascript:self.close();">close</DIV>
<P onmouseover="javascript:self.close();">close</P>

Both achieve the same thing, so the above events are not tied to <A> tags.

This means that any allowed tag could be used maliciously with an event like
onmouseover to cause havoc on any site.


So we could strip out ALL of the above events (and the many more that
exist), but then we'd be taking away the ability for these events to work in
our favour on CSS and DHTML projects... so that's a choice you'd have to
make.


An ideal solution would be for a function that looked for a list of events
(like above, but more of them) in a string , and stripped them out if they
begin with 'javascript:'.  If <A> is one of your allowed HTML tags when
using strip_tags(), it would also have to strip out any HREF which begins
with 'javascript:', or perhaps strip out the entire <A> tag

So

<DIV onmouseover="javascript:self.close();">close</DIV>
would become
<DIV >close</DIV>

<A HREF="#" onmouseover="javascript:self.close();">bye bye</a>
would become
<A HREF="#" >

<A HREF="javascript:self.close();">bye bye</a>
would become
<A>bye bye</a> or bye bye


This sounds like a big job -- way out of my league, from both a logic and
regular expression point of view, but a worthwhile cause indeed!!!



I had a quick search for javascript on php.net, and found limited stuff of
interest:

there was this code on
http://www.php.net/manual/en/function.preg-replace.php

<?
// $document should contain an HTML document.
// This will remove HTML tags, javascript sections
// and white space. It will also convert some
// common HTML entities to their text equivalent.

$search = array ("'<script[^>]*?>.*?</script>'si",  // Strip out javascript
                 "'<[\/\!]*?[^<>]*?>'si",           // Strip out html tags
                 "'([\r\n])[\s]+'",                 // Strip out white space
                 "'&(quot|#34);'i",                 // Replace html entities
                 "'&(amp|#38);'i",
                 "'&(lt|#60);'i",
                 "'&(gt|#62);'i",
                 "'&(nbsp|#160);'i",
                 "'&(iexcl|#161);'i",
                 "'&(cent|#162);'i",
                 "'&(pound|#163);'i",
                 "'&(copy|#169);'i",
                 "'&#(\d+);'e");                    // evaluate as php

$replace = array ("",
                  "",
                  "\\1",
                  "\"",
                  "&",
                  "<",
                  ">",
                  " ",
                  chr(161),
                  chr(162),
                  chr(163),
                  chr(169),
                  "chr(\\1)");

$text = preg_replace ($search, $replace, $document);
?>


On http://www.php.net/manual/en/function.strip-tags.php I found this
warning:

"This function does not modify any attributes on the tags that you allow
using allowable_tags, including the style and onmouseover attributes that a
mischievous user may abuse when posting text that will be shown to other
users."


There is also a number of interesting comments in regards to these events.


Perhaps the solution is a regular expression, or perhaps it's to do with XML
parsing, which I've not had any experience.


Sorry for the long post!!!


Justin French
--------------------
Creative Director
http://Indent.com.au
--------------------


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to