On Sun, Aug 5, 2012 at 7:19 PM, Chris Little <chris...@crosswire.org> wrote:
>
>
> On Aug 5, 2012, at 11:37 AM, David Haslam <dfh...@googlemail.com> wrote:
>
>> FWIW, I just came across this  http://www.pythonregex.com/ Python Regular
>> Expression Testing Tool
>>
>> Does Python support the full 21-bit Unicode range?
>>
>> cf. Many other regular expression engines only support the Basic
>> Multilingual Plane.
>>
>
> Yes, Python regex supports non-BMP characters. The language tags are Plane 
> 14, I believe. An engine that supports only the BMP can't be said to support 
> Unicode and is probably just processing bytes.
>

As further explanation, Python differentiates between the "string"
object, which is 8-bit encoding representation of objects in any
selected encoding and "unicode" objects which are strings of Unicode
characters. The exact internal representation probably differs between
CPython and Jython. CPython used to use UCS-2 but now can use either
UCS-2 or UCS-4 since the extension of the BMP.

To read more details see
http://www.cmlenz.net/archives/2008/07/the-truth-about-unicode-in-python
under the heading "Internal Representation".

--Greg

> --Chris
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to