bruce wrote:
> hi paddy...
>
> that's exactly what i'm trying to accomplish... i've used tidy, but it seems
> to still generate warnings...
>
> initFile -> tidy ->cleanFile -> perl app (using xpath/livxml)
>
> the xpath/linxml functions in the perl app complain regarding the file. my
> thought is
Ravi Teja wrote:
>> Of course, lxml should be able to do this kind of thing as well. I'd be
>> interested to know why this "is not a good idea", though.
>
> No reason that you don't know already.
>
> http://www.boddie.org.uk/python/HTML.html
>
> "If the document text is well-formed XML, we coul
Paul Boddie wrote:
> Ravi Teja wrote:
> >
> > 1.) XPath is not a good idea at all with "malformed" HTML or perhaps
> > web pages in general.
>
> import libxml2dom
> import urllib
> f = urllib.urlopen("http://wiki.python.org/moin/";)
> s = f.read()
> f.close()
> # s contains HTML not XML text
> d =
bruce wrote:
> that's exactly what i'm trying to accomplish... i've used tidy, but it seems
> to still generate warnings...
>
> initFile -> tidy ->cleanFile -> perl app (using xpath/livxml)
>
> the xpath/linxml functions in the perl app complain regarding the file. my
> thought is that tidy isn't
Ravi Teja wrote:
>
> 1.) XPath is not a good idea at all with "malformed" HTML or perhaps
> web pages in general.
import libxml2dom
import urllib
f = urllib.urlopen("http://wiki.python.org/moin/";)
s = f.read()
f.close()
# s contains HTML not XML text
d = libxml2dom.parseString(s, html=1)
# get th
bruce wrote:
> that's exactly what i'm trying to accomplish... i've used tidy, but it seems
> to still generate warnings...
>
> initFile -> tidy ->cleanFile -> perl app (using xpath/livxml)
>
> the xpath/linxml functions in the perl app complain regarding the file.
what exactly do they complai
turday, July 01, 2006 1:09 AM
To: python-list@python.org
Subject: Re: beautifulsoup .vs tidy
bruce wrote:
> hi...
>
> never used perl, but i have an issue trying to resolve some html that
> appears to be "dirty/malformed" regarding the overall structure. in
> researchi
bruce wrote:
> hi...
>
> never used perl, but i have an issue trying to resolve some html that
> appears to be "dirty/malformed" regarding the overall structure. in
> researching validators, i came across the beautifulsoup app and wanted to
> know if anybody could give me pros/cons of the app as i
bruce wrote:
> hi...
>
> never used perl, but i have an issue trying to resolve some html that
> appears to be "dirty/malformed" regarding the overall structure. in
> researching validators, i came across the beautifulsoup app and wanted to
> know if anybody could give me pros/cons of the app as it