[issue22746] cgitb html: wrong encoding for utf-8

2014-12-02 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: We can convert cgitb.hook to produce ASCII-compatible output with charrefs in 3.x. But there is a problem with str in 2.7. 8-bit string can contain non-ASCII data and the encoding is not known in general case. --

[issue22746] cgitb html: wrong encoding for utf-8

2014-10-31 Thread Ezio Melotti
Ezio Melotti added the comment: > In normal HTML utf-8 works fine, doesn't it? It does, in fact as long as the encoding used by the browser matches the one used in the file, no charrefs needs to be used (except > < and "). Of course, if non-Unicode encodings are used, the range of available c

[issue22746] cgitb html: wrong encoding for utf-8

2014-10-28 Thread Wolfgang Rohdewald
Wolfgang Rohdewald added the comment: > > You need to use codecs.open instead of open > No, why? in python3 open() supports the errors handler. right, but not in python2 which has the same problem. I need my code to run with both. > Do you have a use case for xmlcharrefreplace in the HTML cont

[issue22746] cgitb html: wrong encoding for utf-8

2014-10-28 Thread R. David Murray
R. David Murray added the comment: In normal HTML utf-8 works fine, doesn't it?. It's only when reading from a file (where the browser doesn't know the encoding) that it fails. Do you have a use case for xmlcharrefreplace in the HTML context (which is what cgitb is primarily targeted at). So

[issue22746] cgitb html: wrong encoding for utf-8

2014-10-28 Thread STINNER Victor
Changes by STINNER Victor : -- components: +Unicode nosy: +haypo ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue22746] cgitb html: wrong encoding for utf-8

2014-10-28 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: > You need to use codecs.open instead of open No, why? in python3 open() supports the errors handler. -- ___ Python tracker ___ __

[issue22746] cgitb html: wrong encoding for utf-8

2014-10-28 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : -- nosy: +ezio.melotti, serhiy.storchaka ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubs

[issue22746] cgitb html: wrong encoding for utf-8

2014-10-28 Thread Wolfgang Rohdewald
Wolfgang Rohdewald added the comment: correction: A bug for everyone using non-ascii characters. -- ___ Python tracker ___ ___ Python-

[issue22746] cgitb html: wrong encoding for utf-8

2014-10-28 Thread Wolfgang Rohdewald
Wolfgang Rohdewald added the comment: > What about > open(..., encoding='latin-1', errors='xmlcharrefreplace') That works fine. I tested with a chinese character 与 But I do not think the application should work around something that cgitb is supposed to handle. More so since the documentation

[issue22746] cgitb html: wrong encoding for utf-8

2014-10-28 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: What about open(..., encoding='latin-1', errors='xmlcharrefreplace') -- nosy: +amaury.forgeotdarc stage: resolved -> needs patch ___ Python tracker

[issue22746] cgitb html: wrong encoding for utf-8

2014-10-27 Thread Wolfgang Rohdewald
Changes by Wolfgang Rohdewald : -- resolution: -> remind ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https

[issue22746] cgitb html: wrong encoding for utf-8

2014-10-27 Thread Wolfgang Rohdewald
Wolfgang Rohdewald added the comment: If you cannot offer a solution for arbitrary unicode, you have no solution at all. Afer all, that is what unicode is about: support ALL languages, not only your own. I do not quite understand why you think this is not a bug. If cgitb encodes unicode like

[issue22746] cgitb html: wrong encoding for utf-8

2014-10-27 Thread R. David Murray
R. David Murray added the comment: If you look at the file, you'll find that the data is in utf-8 (at least if your locale is a utf-8 locale). However, html is by default interpreted as latin-1, so that's what the webrowser displays when you pass the file on disk to it. If you add "encoding=

[issue22746] cgitb html: wrong encoding for utf-8

2014-10-27 Thread Wolfgang Rohdewald
New submission from Wolfgang Rohdewald: The attached script shows the non-ascii characters wrong wherever they occur, including the exception message and the comment in the source code. Looking at the produced .html, I can say that cgitb simply passes the single byte utf-8 codes without encodi