Alnilam wrote: > On Jan 22, 8:44 am, Alnilam <[EMAIL PROTECTED]> wrote: >> > Pardon me, but the standard issue Python 2.n (for n in range(5, 2, >> > -1)) doesn't have an xml.dom.ext ... you must have the mega-monstrous >> > 200-modules PyXML package installed. And you don't want the 75Kb >> > BeautifulSoup? >> >> I wasn't aware that I had PyXML installed, and can't find a reference >> to having it installed in pydocs. ... > > Ugh. Found it. Sorry about that, but I still don't understand why > there isn't a simple way to do this without using PyXML, BeautifulSoup > or libxml2dom. What's the point in having sgmllib, htmllib, > HTMLParser, and formatter all built in if I have to use use someone > else's modules to write a couple of lines of code that achieve the > simple thing I want. I get the feeling that this would be easier if I > just broke down and wrote a couple of regular expressions, but it > hardly seems a 'pythonic' way of going about things.
This is simply a gross misunderstanding of what BeautifulSoup or lxml accomplish. Dealing with mal-formatted HTML whilst trying to make _some_ sense is by no means trivial. And just because you can come up with a few lines of code using rexes that work for your current use-case doesn't mean that they serve as general html-fixing-routine. Or do you think the rather long history and 75Kb of code for BS are because it's creator wasn't aware of rexes? And it also makes no sense stuffing everything remotely useful into the standard lib. This would force to align development and release cycles, resulting in much less features and stability as it can be wished. And to be honest: I fail to see where your problem is. BeatifulSoup is a single Python file. So whatever you carry with you from machine to machine, if it's capable of holding a file of your own code, you can simply put BeautifulSoup beside it - even if it was a floppy disk. Diez -- http://mail.python.org/mailman/listinfo/python-list