Re: Unicode codepoints

2011-06-22 Thread Vlastimil Brom
2011/6/22 Saul Spatz : > Thanks.  I agree with you about the generator.  Using your first suggestion, > code points above U+ get separated into two "surrogate pair" characters > fron UTF-16.  So instead of U=10 I get U+DBFF and U+DFFF. > -- > http://mail.python.org/mailman/listinfo/python

Re: Unicode codepoints

2011-06-22 Thread jmfauth
On 22 juin, 16:07, Saul Spatz wrote: > Thanks very much.  This is the elegant kind of solution I was looking for.  I > had hoped there was a way to do it without even addressing the matter of > surrogates, but apparently not.  The reason I don't like this is that it > depends on knowing that py

Re: Unicode codepoints

2011-06-22 Thread Saul Spatz
Thanks very much. This is the elegant kind of solution I was looking for. I had hoped there was a way to do it without even addressing the matter of surrogates, but apparently not. The reason I don't like this is that it depends on knowing that python internally stores strings in UTF-16. I e

Re: Unicode codepoints

2011-06-22 Thread Saul Spatz
Thanks. I agree with you about the generator. Using your first suggestion, code points above U+ get separated into two "surrogate pair" characters fron UTF-16. So instead of U=10 I get U+DBFF and U+DFFF. -- http://mail.python.org/mailman/listinfo/python-list

Re: Unicode codepoints

2011-06-22 Thread jmfauth
That seems to me correct. >>> '\\u{:04x}'.format(ord(u'é')) \u00e9 >>> '\\U{:08x}'.format(ord(u'é')) \U00e9 >>> because >>> u'\U00e9' File "", line 1 SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-5: end of string in escape sequence >>> u'\U00e9' é

Re: Unicode codepoints

2011-06-22 Thread Peter Otten
Saul Spatz wrote: > Hi, > > I'm just starting to learn a bit about Unicode. I want to be able to read > a utf-8 encoded file, and print out the codepoints it encodes. After many > false starts, here's a script that seems to work, but it strikes me as > awfully awkward and unpythonic. Have you a

Re: Unicode codepoints

2011-06-22 Thread Vlastimil Brom
2011/6/22 Saul Spatz : > Hi, > > I'm just starting to learn a bit about Unicode. I want to be able to read a > utf-8 encoded file, and print out the codepoints it encodes.  After many > false starts, here's a script that seems to work, but it strikes me as > awfully awkward and unpythonic.  Have

Re: Unicode codepoints

2011-06-21 Thread Chris Angelico
On Wed, Jun 22, 2011 at 1:37 PM, Saul Spatz wrote: > Hi, > > I'm just starting to learn a bit about Unicode. I want to be able to read a > utf-8 encoded file, and print out the codepoints it encodes.  After many > false starts, here's a script that seems to work, but it strikes me as > awfully