Re: how to avoid leading white spaces

Chris Angelico Thu, 02 Jun 2011 20:55:30 -0700

On Fri, Jun 3, 2011 at 1:44 PM, Roy Smith <[email protected]> wrote:
> In article <[email protected]>,
>  Chris Torek <[email protected]> wrote:
>
>> Python might be penalized by its use of Unicode here, since a
>> Boyer-Moore table for a full 16-bit Unicode string would need
>> 65536 entries (one per possible ord() value).
>
> I'm not sure what you mean by "full 16-bit Unicode string"?  Isn't
> unicode inherently 32 bit?  Or at least 20-something bit?  Things like
> UTF-16 are just one way to encode it.


The size of a Unicode character is like the size of a number. It's not
defined in terms of a maximum. However, Unicode planes 0-2 have all
the defined printable characters, and there are only 16 planes in
total, so (since each plane is 2^16 characters) that kinda makes
Unicode 18-bit or 20-bit. UTF-16 / UCS-2, therefore, uses two 16-bit
numbers to store a 20-bit number. Why do I get the feeling I've met
that before...

Chris Angelico
136E:0100 CD 20   INT 20
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: how to avoid leading white spaces

Reply via email to