Re: encoding="utf8" ignored when parsing XML

2016-12-27 Thread Steve D'Aprano
On Wed, 28 Dec 2016 02:05 am, Skip Montanaro wrote: > I am trying to parse some XML which doesn't specify an encoding (Python > 2.7.12 via Anaconda on RH Linux), so it barfs when it encounters non-ASCII > data. No great surprise there, but I'm having trouble getting it to use > another encoding. F

Re: encoding="utf8" ignored when parsing XML

2016-12-27 Thread Peter Otten
Peter Otten wrote: > works, but to go back to the bytes that the XML parser needs the > "preferred encoding", in your case ASCII, will be used. Correction: it's probably sys.getdefaultencoding() rather than locale.getdefaultencoding(). So all systems with a sane configuration will behave the sa

Re: encoding="utf8" ignored when parsing XML

2016-12-27 Thread Peter Otten
Skip Montanaro wrote: > Peter> Isn't UTF-8 the default? > > Apparently not. Sorry, I meant the default for XML. > I believe in my reading it said that it used whatever > locale.getpreferredencoding() returned. That's problematic when you > live in a country that thinks ASCII is everything. Per

Re: encoding="utf8" ignored when parsing XML

2016-12-27 Thread Skip Montanaro
Peter> Isn't UTF-8 the default? Apparently not. I believe in my reading it said that it used whatever locale.getpreferredencoding() returned. That's problematic when you live in a country that thinks ASCII is everything. Personally, I think UTF-8 should be the default, but that train's long left t

Re: encoding="utf8" ignored when parsing XML

2016-12-27 Thread Peter Otten
Skip Montanaro wrote: > I am trying to parse some XML which doesn't specify an encoding (Python > 2.7.12 via Anaconda on RH Linux), so it barfs when it encounters non-ASCII > data. No great surprise there, but I'm having trouble getting it to use > another encoding. First, I tried specifying the e

encoding="utf8" ignored when parsing XML

2016-12-27 Thread Skip Montanaro
I am trying to parse some XML which doesn't specify an encoding (Python 2.7.12 via Anaconda on RH Linux), so it barfs when it encounters non-ASCII data. No great surprise there, but I'm having trouble getting it to use another encoding. First, I tried specifying the encoding when opening the fil