On Sun, Aug 5, 2012 at 7:19 PM, Chris Little <chris...@crosswire.org> wrote: > > > On Aug 5, 2012, at 11:37 AM, David Haslam <dfh...@googlemail.com> wrote: > >> FWIW, I just came across this http://www.pythonregex.com/ Python Regular >> Expression Testing Tool >> >> Does Python support the full 21-bit Unicode range? >> >> cf. Many other regular expression engines only support the Basic >> Multilingual Plane. >> > > Yes, Python regex supports non-BMP characters. The language tags are Plane > 14, I believe. An engine that supports only the BMP can't be said to support > Unicode and is probably just processing bytes. >
As further explanation, Python differentiates between the "string" object, which is 8-bit encoding representation of objects in any selected encoding and "unicode" objects which are strings of Unicode characters. The exact internal representation probably differs between CPython and Jython. CPython used to use UCS-2 but now can use either UCS-2 or UCS-4 since the extension of the BMP. To read more details see http://www.cmlenz.net/archives/2008/07/the-truth-about-unicode-in-python under the heading "Internal Representation". --Greg > --Chris > > > _______________________________________________ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page