On 2008-01-23 01:29, Gabriel Genellina wrote: > En Tue, 22 Jan 2008 19:20:32 -0200, Alnilam <[EMAIL PROTECTED]> escribió: > >> On Jan 22, 11:39 am, "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote: >>> Alnilam wrote: >>>> On Jan 22, 8:44 am, Alnilam <[EMAIL PROTECTED]> wrote: >>>>>> Pardon me, but the standard issue Python 2.n (for n in range(5, 2, >>>>>> -1)) doesn't have an xml.dom.ext ... you must have the >>> mega-monstrous >>>>>> 200-modules PyXML package installed. And you don't want the 75Kb >>>>>> BeautifulSoup? >>>> Ugh. Found it. Sorry about that, but I still don't understand why >>>> there isn't a simple way to do this without using PyXML, BeautifulSoup >>>> or libxml2dom. What's the point in having sgmllib, htmllib, >>>> HTMLParser, and formatter all built in if I have to use use someone >>>> else's modules to write a couple of lines of code that achieve the >>>> simple thing I want. I get the feeling that this would be easier if I >>>> just broke down and wrote a couple of regular expressions, but it >>>> hardly seems a 'pythonic' way of going about things. >>> This is simply a gross misunderstanding of what BeautifulSoup or lxml >>> accomplish. Dealing with mal-formatted HTML whilst trying to make _some_ >>> sense is by no means trivial. And just because you can come up with a >>> few >>> lines of code using rexes that work for your current use-case doesn't >>> mean >>> that they serve as general html-fixing-routine. Or do you think the >>> rather >>> long history and 75Kb of code for BS are because it's creator wasn't >>> aware >>> of rexes? >> I am, by no means, trying to trivialize the work that goes into >> creating the numerous modules out there. However as a relatively >> novice programmer trying to figure out something, the fact that these >> modules are pushed on people with such zealous devotion that you take >> offense at my desire to not use them gives me a bit of pause. I use >> non-included modules for tasks that require them, when the capability >> to do something clearly can't be done easily another way (eg. >> MySQLdb). I am sure that there will be plenty of times where I will >> use BeautifulSoup. In this instance, however, I was trying to solve a >> specific problem which I attempted to lay out clearly from the >> outset. >> >> I was asking this community if there was a simple way to use only the >> tools included with Python to parse a bit of html.
There are lots of ways doing HTML parsing in Python. A common one is e.g. using mxTidy to convert the HTML into valid XHTML and then use ElementTree to parse the data. http://www.egenix.com/files/python/mxTidy.html http://docs.python.org/lib/module-xml.etree.ElementTree.html For simple tasks you can also use the HTMLParser that's part of the Python std lib. http://docs.python.org/lib/module-HTMLParser.html Which tools to use is really dependent on what you are trying to solve. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 23 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 -- http://mail.python.org/mailman/listinfo/python-list