Hey, Could anyone please comment on the purest way simply to strip HTML tags from the internal text they surround?
I know Beautiful Soup is a convenient tool, but I’m interested to know what the most minimal way to do it would be. People say you usually don’t use Regex for a second order language like HTML, so I was thinking about using xpath or lxml, which seem like very pure, universal tools for the job. I did find an example for doing this with the re module, though. Would it be fair to say that to just strip the tags, Regex is fine, but you need to build a tree-like object if you want the ability to select which nodes to keep and which to discard? Can xpath / lxml do that? What are the chief differences between xpath / lxml and Beautiful Soup? Thanks, Julius -- https://mail.python.org/mailman/listinfo/python-list