Hi, 2012/3/13 Elazar Leibovich <elaz...@gmail.com>
> 2012/3/13 kobi zamir <kobi.za...@gmail.com> > >> >> >>> So I guess that you're also in the UTF-8 camp. >>> >> >> yes, but my opinion about utf-8 is just my opinion. i like python and >> python defaults to utf-8. >> > > Python's internal representation is not UTF-8, but UTF-16, or UTF-32, > depends on build parameters. Thus python doesn't really support code points > above the BMP. > Of course, you cannot know the internal representation, since python > (cleverly) does not allow you to cast a unicode string to a sequence of > bytes without specifying the result encoding. > > http://docs.python.org/c-api/unicode.html > > (see also this very good > presentation<http://98.245.80.27/tcpc/OSCON2011/gbu.html>on internal unicode > representations in various languages). > > Nitpick: It's actually ucs2/ucs4 (which preceded the above but are compatible). Actually one can know the internal representation by checking sys.maxunicode [1]. I'm using it in python-bidi to manually handle surrogate pairs if needed [2]. [1] http://docs.python.org/dev/library/sys.html#sys.maxunicode [2] https://github.com/MeirKriheli/python-bidi/blob/master/src/bidi/algorithm.py#L46 Cheers -- Meir
_______________________________________________ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il