I'm working on a program to remove tags from a HTML document, leaving just the content, but I want to do it simply. I've finished a system to remove simple tags, but I want all CSS and JS to be removed. What re pattern could I use to do that?
I've tried '<script[\S\s]*/script>' but that didn't work properly. I'm fairly basic in my knowledge of Python, so I'm still trying to learn re. What pattern would work? -- http://mail.python.org/mailman/listinfo/python-list