On Sat, Jan 21, 2017 at 8:21 PM, Pete Forman <petef4+use...@gmail.com> wrote: > Marko Rauhamaa <ma...@pacujo.net> writes: > >>> py> low = '\uDC37' >> >> That should raise a SyntaxError exception. > > Quite. My point was that with older Python on a narrow build (Windows > and Mac) you need to understand that you are using UTF-16 rather than > Unicode. On a wide build or Python 3.3+ then all is rosy. (At this point > I'm tempted to put in a winky emoji but that might push the internal > representation into UCS-4.)
CPython allows surrogate codes for use with the "surrogateescape" and "surrogatepass" error handlers, which are used for POSIX and Windows file-system encoding, respectively. Maybe MicroPython goes about the file-system round-trip problem differently, or maybe it just require using bytes for file-system and environment-variable names on POSIX and doesn't care about Windows. "surrogateescape" allows 'decoding' arbitrary bytes: >>> b'\x81'.decode('ascii', 'surrogateescape') '\udc81' >>> '\udc81'.encode('ascii', 'surrogateescape') b'\x81' This error handler is required by CPython on POSIX to handle arbitrary bytes in file-system paths. For example, when running with LANG=C: >>> sys.getfilesystemencoding() 'ascii' >>> os.listdir(b'.') [b'\x81'] >>> os.listdir('.') ['\udc81'] "surrogatepass" allows encoding surrogates: >>> '\udc81'.encode('utf-8', 'surrogatepass') b'\xed\xb2\x81' >>> b'\xed\xb2\x81'.decode('utf-8', 'surrogatepass') '\udc81' This error handler is used by CPython 3.6+ to encode Windows UCS-2 file-system paths as WTF-8 (Wobbly). For example: >>> os.listdir('.') ['\udc81'] >>> os.listdir(b'.') [b'\xed\xb2\x81'] -- https://mail.python.org/mailman/listinfo/python-list