>> I've tried >> '<script[\S\s]*/script>' >> but that didn't work properly. I'm fairly basic in my knowledge of >> Python, so I'm still trying to learn re. >> What pattern would work? > > I use re.compile("<script.*?</script>",re.DOTALL) > for scripts. I strip this out first since my tag stripping re will > strip out script tags as well hope this was of help.
This won't catch various alterations of < script > doEvil() < / script > which is valid html/xhtml. For less valid html, but still attemptable, one might find something like <scrip<script>hah</script>t>doEvil()</script> which, if you nuke your pattern, leaves the valid but unwanted <script>doEvil()</script> I'd propose that it's better to use something such as BeautifulSoup that actually parses the HTML, and then skim through it whitelisting the tags you plan to allow, and skipping the emission of any tags that don't make the whitelist. -tkc -- http://mail.python.org/mailman/listinfo/python-list