> p = re.compile("(\<script.*>*\</script>)",re.IGNORECASE | re.DOTALL) > m = p.search(data)
First, I presume you didn't copy & paste your expression, as it looks like you're missing a period before the second asterisk. Otherwise, all you'd get is any number of greater-than signs followed by a closing "</script>" tag. Second, you're likely getting some foobar results because you're not using a "real" string of the form r'(\<script...script>)' > The problem is that I'm getting everything from the 1st > script's start tag to the last script's end tag in one > group - so it seems like it parses the string from both > ends therefore removing far more from that data than I > want. What am I doing wrong? Looks like you want the non-greedy modifier to the "*" described at http://docs.python.org/lib/re-syntax.html (searching the page for "greedy" should turn up the paragraph on the modifiers) You likely want something more like: r'<script[^>]*>.*?</script>' In the first atom, you're looking for the remainder of the script tag (as much stuff that isn't a ">" as possible). Then you close the tag with the ">", and then take as little as possible (".*?") of anything until you find the closing "</script>" tag. HTH, -tkc -- http://mail.python.org/mailman/listinfo/python-list