On 10/23/2011 09:06 PM, ???????? wrote:
C:\Documents and Settings\peng>cd c:\python32
C:\Python32>python
Python 3.2.2 (default, Sep 4 2011, 09:51:08) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
import lxml.html
sfile='http://finance.yahoo.com/q/op?s=A+Options'
root=lxml.html.parse(sfile).getroot()
there is no problem to parse :
http://finance.yahoo.com/q/op?s=A+Options'
why i can not parse
http://frux.wikispaces.com/ ??
import lxml.html
sfile='http://frux.wikispaces.com/'
root=lxml.html.parse(sfile).getroot()
Traceback (most recent call last):
File "<stdin>", line 1, in<module>
File "C:\Python32\lib\site-packages\lxml\html\__init__.py", line 692, in
parse
return etree.parse(filename_or_url, parser, base_url=base_url, **kw)
File "lxml.etree.pyx", line 2942, in lxml.etree.parse
(src/lxml/lxml.etree.c:5
4187)
File "parser.pxi", line 1528, in lxml.etree._parseDocument
(src/lxml/lxml.etre
e.c:79485)
File "parser.pxi", line 1557, in lxml.etree._parseDocumentFromURL
(src/lxml/lx
ml.etree.c:79768)
File "parser.pxi", line 1457, in lxml.etree._parseDocFromFile
(src/lxml/lxml.e
tree.c:78843)
File "parser.pxi", line 997, in lxml.etree._BaseParser._parseDocFromFile
(src/
lxml/lxml.etree.c:75698)
File "parser.pxi", line 564, in
lxml.etree._ParserContext._handleParseResultDo
c (src/lxml/lxml.etree.c:71739)
File "parser.pxi", line 645, in lxml.etree._handleParseResult
(src/lxml/lxml.e
tree.c:72614)
File "parser.pxi", line 583, in lxml.etree._raiseParseError
(src/lxml/lxml.etr
ee.c:71927)
IOError: Error reading file 'b'http://frux.wikispaces.com/'': b'failed to load e
xternal entity "http://frux.wikispaces.com/"'
>
Double-spacing makes your message much harder to read. I can only
comment in a general way, in any case. most html is mal-formed, and not
legal html. Although I don't have any experience with parsing it, I do
with xml which has similar problems.
The first thing I'd do is to separate the loading of the byte string
from the website, from the parsing of those bytes. Further, I'd make a
local copy of those bytes, so you can do testing repeatably. For
example, you could run wget utility to copy the bytes locally and create
a file.
--
DaveA
--
http://mail.python.org/mailman/listinfo/python-list