Re: beautifulsoup .vs tidy

2006-07-02 Thread uche . ogbuji
bruce wrote: > hi paddy... > > that's exactly what i'm trying to accomplish... i've used tidy, but it seems > to still generate warnings... > > initFile -> tidy ->cleanFile -> perl app (using xpath/livxml) > > the xpath/linxml functions in the perl app complain regarding the file. my > thought is

Re: beautifulsoup .vs tidy

2006-07-02 Thread Fredrik Lundh
Ravi Teja wrote: >> Of course, lxml should be able to do this kind of thing as well. I'd be >> interested to know why this "is not a good idea", though. > > No reason that you don't know already. > > http://www.boddie.org.uk/python/HTML.html > > "If the document text is well-formed XML, we coul

Re: beautifulsoup .vs tidy

2006-07-01 Thread Ravi Teja
Paul Boddie wrote: > Ravi Teja wrote: > > > > 1.) XPath is not a good idea at all with "malformed" HTML or perhaps > > web pages in general. > > import libxml2dom > import urllib > f = urllib.urlopen("http://wiki.python.org/moin/";) > s = f.read() > f.close() > # s contains HTML not XML text > d =

Re: beautifulsoup .vs tidy

2006-07-01 Thread Matt Good
bruce wrote: > that's exactly what i'm trying to accomplish... i've used tidy, but it seems > to still generate warnings... > > initFile -> tidy ->cleanFile -> perl app (using xpath/livxml) > > the xpath/linxml functions in the perl app complain regarding the file. my > thought is that tidy isn't

Re: beautifulsoup .vs tidy

2006-07-01 Thread Paul Boddie
Ravi Teja wrote: > > 1.) XPath is not a good idea at all with "malformed" HTML or perhaps > web pages in general. import libxml2dom import urllib f = urllib.urlopen("http://wiki.python.org/moin/";) s = f.read() f.close() # s contains HTML not XML text d = libxml2dom.parseString(s, html=1) # get th

Re: beautifulsoup .vs tidy

2006-07-01 Thread Fredrik Lundh
bruce wrote: > that's exactly what i'm trying to accomplish... i've used tidy, but it seems > to still generate warnings... > > initFile -> tidy ->cleanFile -> perl app (using xpath/livxml) > > the xpath/linxml functions in the perl app complain regarding the file. what exactly do they complai

RE: beautifulsoup .vs tidy

2006-07-01 Thread bruce
turday, July 01, 2006 1:09 AM To: python-list@python.org Subject: Re: beautifulsoup .vs tidy bruce wrote: > hi... > > never used perl, but i have an issue trying to resolve some html that > appears to be "dirty/malformed" regarding the overall structure. in > researchi

Re: beautifulsoup .vs tidy

2006-07-01 Thread Paddy
bruce wrote: > hi... > > never used perl, but i have an issue trying to resolve some html that > appears to be "dirty/malformed" regarding the overall structure. in > researching validators, i came across the beautifulsoup app and wanted to > know if anybody could give me pros/cons of the app as i

Re: beautifulsoup .vs tidy

2006-06-30 Thread Ravi Teja
bruce wrote: > hi... > > never used perl, but i have an issue trying to resolve some html that > appears to be "dirty/malformed" regarding the overall structure. in > researching validators, i came across the beautifulsoup app and wanted to > know if anybody could give me pros/cons of the app as it