2011/6/22 Saul Spatz :
> Thanks. I agree with you about the generator. Using your first suggestion,
> code points above U+ get separated into two "surrogate pair" characters
> fron UTF-16. So instead of U=10 I get U+DBFF and U+DFFF.
> --
> http://mail.python.org/mailman/listinfo/python
On 22 juin, 16:07, Saul Spatz wrote:
> Thanks very much. This is the elegant kind of solution I was looking for. I
> had hoped there was a way to do it without even addressing the matter of
> surrogates, but apparently not. The reason I don't like this is that it
> depends on knowing that py
Thanks very much. This is the elegant kind of solution I was looking for. I
had hoped there was a way to do it without even addressing the matter of
surrogates, but apparently not. The reason I don't like this is that it
depends on knowing that python internally stores strings in UTF-16. I e
Thanks. I agree with you about the generator. Using your first suggestion,
code points above U+ get separated into two "surrogate pair" characters
fron UTF-16. So instead of U=10 I get U+DBFF and U+DFFF.
--
http://mail.python.org/mailman/listinfo/python-list
That seems to me correct.
>>> '\\u{:04x}'.format(ord(u'é'))
\u00e9
>>> '\\U{:08x}'.format(ord(u'é'))
\U00e9
>>>
because
>>> u'\U00e9'
File "", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes
in position 0-5: end of string in escape sequence
>>> u'\U00e9'
é
rd and unpythonic. Have you a better way?
>
> def codePoints(s):
> ''' return a list of the Unicode codepoints in the string s '''
> answer = []
> skip = False
> for k, c in enumerate(s):
> if skip:
>
fully awkward and unpythonic. Have you a better way?
>
> def codePoints(s):
> ''' return a list of the Unicode codepoints in the string s '''
> answer = []
> skip = False
> for k, c in enumerate(s):
> if skip:
> skip = F
;+hex(ord(c))[2:])
But if you do need the codePoints() function, I'd do it as a generator.
> def codePoints(s):
> ''' return a list of the Unicode codepoints in the string s '''
> skip = False
> for k, c in enumerate(s):
> if skip:
>
way?
def codePoints(s):
''' return a list of the Unicode codepoints in the string s '''
answer = []
skip = False
for k, c in enumerate(s):
if skip:
skip = False
answer.append(ord(s[k-1:k+1]))
continue