Thanks very much.  This is the elegant kind of solution I was looking for.  I 
had hoped there was a way to do it without even addressing the matter of 
surrogates, but apparently not.  The reason I don't like this is that it 
depends on knowing that python internally stores strings in UTF-16.  I expected 
that there would be some built-in iterator that would return the code points.  
(Actually, this all started when I realized that s[k] wouldn't necessarily give 
me the kth character of the string s.)
        

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to