Re: Unicode codepoints

2011-06-22 Thread Vlastimil Brom
2011/6/22 Saul Spatz : > Thanks.  I agree with you about the generator.  Using your first suggestion, > code points above U+ get separated into two "surrogate pair" characters > fron UTF-16.  So instead of U=10 I get U+DBFF and U+DFFF. > -- > http://mail.python.org/mailman/listinfo/python

Re: Unicode codepoints

2011-06-22 Thread jmfauth
On 22 juin, 16:07, Saul Spatz wrote: > Thanks very much.  This is the elegant kind of solution I was looking for.  I > had hoped there was a way to do it without even addressing the matter of > surrogates, but apparently not.  The reason I don't like this is that it > depends on knowing that py

Re: Unicode codepoints

2011-06-22 Thread Saul Spatz
Thanks very much. This is the elegant kind of solution I was looking for. I had hoped there was a way to do it without even addressing the matter of surrogates, but apparently not. The reason I don't like this is that it depends on knowing that python internally stores strings in UTF-16. I e

Re: Unicode codepoints

2011-06-22 Thread Saul Spatz
Thanks. I agree with you about the generator. Using your first suggestion, code points above U+ get separated into two "surrogate pair" characters fron UTF-16. So instead of U=10 I get U+DBFF and U+DFFF. -- http://mail.python.org/mailman/listinfo/python-list

Re: Unicode codepoints

2011-06-22 Thread jmfauth
That seems to me correct. >>> '\\u{:04x}'.format(ord(u'é')) \u00e9 >>> '\\U{:08x}'.format(ord(u'é')) \U00e9 >>> because >>> u'\U00e9' File "", line 1 SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-5: end of string in escape sequence >>> u'\U00e9' é

Re: Unicode codepoints

2011-06-22 Thread Peter Otten
rd and unpythonic. Have you a better way? > > def codePoints(s): > ''' return a list of the Unicode codepoints in the string s ''' > answer = [] > skip = False > for k, c in enumerate(s): > if skip: >

Re: Unicode codepoints

2011-06-22 Thread Vlastimil Brom
fully awkward and unpythonic.  Have you a better way? > > def codePoints(s): >    ''' return a list of the Unicode codepoints in the string s ''' >    answer = [] >    skip = False >    for k, c in enumerate(s): >        if skip: >            skip = F

Re: Unicode codepoints

2011-06-21 Thread Chris Angelico
;+hex(ord(c))[2:]) But if you do need the codePoints() function, I'd do it as a generator. > def codePoints(s): >    ''' return a list of the Unicode codepoints in the string s ''' >    skip = False >    for k, c in enumerate(s): >        if skip: >    

Unicode codepoints

2011-06-21 Thread Saul Spatz
way? def codePoints(s): ''' return a list of the Unicode codepoints in the string s ''' answer = [] skip = False for k, c in enumerate(s): if skip: skip = False answer.append(ord(s[k-1:k+1])) continue