Richard Lewis wrote: > On Mon, 20 Jun 2005 14:27:17 +0200, "Fredrik Lundh" > <[EMAIL PROTECTED]> said: > > > > well, you're messing it up all by yourself. getting rid of all the > > codecs and > > unicode2charrefs nonsense will fix this: > > > Thanks for being so patient and understanding. > > OK, I've taken it all out. The only thinking about encoding I had to do > in the actual code I'm working on was to use: > file.write(document.toxml(encoding="utf-8")) > > instead of just > file.write(document.toxml()) > > because otherwise I got errors on copyright symbol characters.
sounds like a bug in minidom... > My code now works without generating any errors but Konqueror's KHTML > and Embedded Advanced Text Viewer and IE5 on the Mac still show > capital-A-with-a-tilde in all the files that have been > generated/altered. Whereas my text editor and Mozilla show them > correctly. > > The "unicode2charrefs() nonsense" was an attempt to make it output with > character references rather than literal characters for all characters > with codes greater than 128. Is there a way of doing this? character references refer to code points in the Unicode code space, so you just convert the bytes you get after converting to UTF-8. however, if you're only using characters from the ISO Latin 1 set (which is a strict subset of Unicode), you could en- code to "iso-8859-1" and run unicode2charrefs on the result. but someone should really fix minidom so it does the right thing. (fwiw, if you use my ElementTree kit, you can simply do tree.write(encoding="us-ascii") and the toolkit will then use charrefs for any character that's not plain ascii. you can get ElementTree from here: http://effbot.org/zone/element-index.htm ) </F> -- http://mail.python.org/mailman/listinfo/python-list