Pete Forman <petef4+use...@gmail.com>: > Surrogates only exist in UTF-16. They are expressly forbidden in UTF-8 > and UTF-32.
Also, they don't exist as Unicode code points. Python shouldn't allow surrogate characters in strings. Thus the range of code points that are available for use as characters is U+0000–U+D7FF and U+E000–U+10FFFF (1,112,064 code points). <URL: https://en.wikipedia.org/wiki/Unicode> The Unicode Character Database is basically a table of characters indexed using integers called ’code points’. Valid code points are in the ranges 0 to #xD7FF inclusive or #xE000 to #x10FFFF inclusive, which is about 1.1 million code points. <URL: https://www.gnu.org/software/guile/docs/master/guile.html/Char acters.html> Guile does the right thing: scheme@(guile-user)> #\xd7ff $1 = #\153777 scheme@(guile-user)> #\xe000 $2 = #\160000 scheme@(guile-user)> #\xd812 While reading expression: ERROR: In procedure scm_lreadr: #<unknown port>:5:8: out-of-range hex c haracter escape: xd812 > py> low = '\uDC37' That should raise a SyntaxError exception. Marko -- https://mail.python.org/mailman/listinfo/python-list