Re: HTML parsing confusion

2008-01-23 Thread Gabriel Genellina
En Wed, 23 Jan 2008 10:40:14 -0200, Alnilam <[EMAIL PROTECTED]> escribió: > Skipping past html validation, and html to xhtml 'cleaning', and > instead starting with the assumption that I have files that are valid > XHTML, can anyone give me a good example of how I would use _ htmllib, > HTMLParser

Re: HTML parsing confusion

2008-01-23 Thread Jerry Hill
On Jan 23, 2008 7:40 AM, Alnilam <[EMAIL PROTECTED]> wrote: > Skipping past html validation, and html to xhtml 'cleaning', and > instead starting with the assumption that I have files that are valid > XHTML, can anyone give me a good example of how I would use _ htmllib, > HTMLParser, or ElementTre

Re: HTML parsing confusion

2008-01-23 Thread Alnilam
On Jan 23, 3:54 am, "M.-A. Lemburg" <[EMAIL PROTECTED]> wrote: > >> I was asking this community if there was a simple way to use only the > >> tools included with Python to parse a bit of html. > > There are lots of ways doing HTML parsing in Python. A common > one is e.g. using mxTidy to convert

Re: HTML parsing confusion

2008-01-23 Thread cokofreedom
> The pages I'm trying to write this code to run against aren't in the > wild, though. They are static html files on my company's lan, are very > consistent in format, and are (I believe) valid html. Obvious way to check this is to go to http://validator.w3.org/ and see what it tells you about you

Re: HTML parsing confusion

2008-01-23 Thread M.-A. Lemburg
On 2008-01-23 01:29, Gabriel Genellina wrote: > En Tue, 22 Jan 2008 19:20:32 -0200, Alnilam <[EMAIL PROTECTED]> escribió: > >> On Jan 22, 11:39 am, "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote: >>> Alnilam wrote: On Jan 22, 8:44 am, Alnilam <[EMAIL PROTECTED]> wrote: >> Pardon me, but the

Re: HTML parsing confusion

2008-01-22 Thread Alnilam
On Jan 22, 7:29 pm, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote: > > > I was asking this community if there was a simple way to use only the > > tools included with Python to parse a bit of html. > > If you *know* that your document is valid HTML, you can use the HTMLParser   > module in the stan

Re: HTML parsing confusion

2008-01-22 Thread [EMAIL PROTECTED]
On Jan 22, 7:29 pm, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote: > > > I was asking this community if there was a simple way to use only the > > tools included with Python to parse a bit of html. > > If you *know* that your document is valid HTML, you can use the HTMLParser > module in the stand

Re: HTML parsing confusion

2008-01-22 Thread Gabriel Genellina
En Tue, 22 Jan 2008 19:20:32 -0200, Alnilam <[EMAIL PROTECTED]> escribió: > On Jan 22, 11:39 am, "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote: >> Alnilam wrote: >> > On Jan 22, 8:44 am, Alnilam <[EMAIL PROTECTED]> wrote: >> >> > Pardon me, but the standard issue Python 2.n (for n in range(5, 2, >>

Re: HTML parsing confusion

2008-01-22 Thread Alnilam
On Jan 22, 11:39 am, "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote: > Alnilam wrote: > > On Jan 22, 8:44 am, Alnilam <[EMAIL PROTECTED]> wrote: > >> > Pardon me, but the standard issue Python 2.n (for n in range(5, 2, > >> > -1)) doesn't have an xml.dom.ext ... you must have the mega-monstrous > >>

Re: HTML parsing confusion

2008-01-22 Thread Diez B. Roggisch
Alnilam wrote: > On Jan 22, 8:44 am, Alnilam <[EMAIL PROTECTED]> wrote: >> > Pardon me, but the standard issue Python 2.n (for n in range(5, 2, >> > -1)) doesn't have an xml.dom.ext ... you must have the mega-monstrous >> > 200-modules PyXML package installed. And you don't want the 75Kb >> > Beau

Re: HTML parsing confusion

2008-01-22 Thread Alnilam
On Jan 22, 8:44 am, Alnilam <[EMAIL PROTECTED]> wrote: > > Pardon me, but the standard issue Python 2.n (for n in range(5, 2, > > -1)) doesn't have an xml.dom.ext ... you must have the mega-monstrous > > 200-modules PyXML package installed. And you don't want the 75Kb > > BeautifulSoup? > > I wasn'

Re: HTML parsing confusion

2008-01-22 Thread Paul McGuire
On Jan 22, 7:44 am, Alnilam <[EMAIL PROTECTED]> wrote: > ...I move from computer to > computer regularly, and while all have a recent copy of Python, each > has different (or no) extra modules, and I don't always have the > luxury of downloading extras. That being said, if there's a simple way > of

Re: HTML parsing confusion

2008-01-22 Thread Alnilam
> Pardon me, but the standard issue Python 2.n (for n in range(5, 2, > -1)) doesn't have an xml.dom.ext ... you must have the mega-monstrous > 200-modules PyXML package installed. And you don't want the 75Kb > BeautifulSoup? I wasn't aware that I had PyXML installed, and can't find a reference to

Re: HTML parsing confusion

2008-01-22 Thread Paul Boddie
On 22 Jan, 06:31, Alnilam <[EMAIL PROTECTED]> wrote: > Sorry for the noob question, but I've gone through the documentation > on python.org, tried some of the diveintopython and boddie's examples, > and looked through some of the numerous posts in this group on the > subject and I'm still rather co

Re: HTML parsing confusion

2008-01-22 Thread John Machin
On Jan 22, 4:31 pm, Alnilam <[EMAIL PROTECTED]> wrote: > Sorry for the noob question, but I've gone through the documentation > on python.org, tried some of the diveintopython and boddie's examples, > and looked through some of the numerous posts in this group on the > subject and I'm still rather