Graham Wideman added the comment:

At the moment I've run out of time to exert much forward push on this.

By way of temporary summary/suggestion for regrouping: Focus on what this page 
is intending to deliver. What concepts should readers of this page be able to 
distinguish and understand when they are finished?

To scope out the needed concepts, I suggest identifying representative 
unicode-related stumbling blocks (possibly from stackoverflow questions).

Here's an example case: just trying to get trivial "beyond ASCII" functionality 
to work on Windows (Win7, Python 3.3):

--------------------
s = 'knight \u265E'
print('Hello ' + s)
--------------------

... which fails with:

"UnicodeEncodeError: 'charmap' codec can't encode character '\u265e' in 
position 13: character maps to undefined". 

A naive attempt to fix this by using s.encode() results in the "+" operation 
failing.

What paths forward do programmers explore in an effort to have this code (a) 
not throw an exception, and produce at least some output, and (b) make it 
produce the correct output?

And why does it work as intended on linux?

The set of concepts identified and explained in this article needs to be 
sufficient to underpin an understanding of the distinct data types, encodings, 
decodings, translations, settings etc relevant to this problem, and how to use 
them to get a desired result.

There are similar problems that occur at other Python-system boundaries, which 
would further illuminate the set of necessary concepts.

Thanks for all comments.

-- Graham

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue20906>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to