On Apr 10, 1:47 am, Alain Ketterlin <al...@dpt-info.u-strasbg.fr> wrote: > jdownie <jdow...@gmail.com> writes: > > I'm trying to get xml.sax to interpret a file that begins with… > > > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http:// > >www.w3.org/TR/html4/loose.dtd"> > > > After a while I get... > > >http://www.w3.org/TR/html4/loose.dtd:31:2:error in processing > > external entity reference > > > …although… > > > time curlhttp://www.w3.org/TR/html4/loose.dtd > > [works] > > You're mistaken. There is no problem fetching the file, but there is a > problem while parsing the file (at line 31, where you find a comment in > an entity declaration, which is not acceptable in XML). > > You're trying to use HTML's SGML DTD in a XML document. Direct your > doctype to XHTML's DTD, and everything will be fine (hopefully). > > BTW, your installation will probably let you use a locally cached copy > of the DTD, instead of fetching a file at every parse. How this works > depends somehow on the parser you use. > > -- Alain.
Excellent. I think I understand that. I'll look around for the xhtml version of the html4/loose DTD and try what you suggest. Thanks very much. -- http://mail.python.org/mailman/listinfo/python-list