Re: How is unicode implemented behind the scenes?

Rustom Mody Sun, 09 Mar 2014 03:37:22 -0700

On Sunday, March 9, 2014 2:09:32 PM UTC+5:30, [email protected] wrote:
> Le dimanche 9 mars 2014 03:40:28 UTC+1, MRAB a écrit :
> > On 2014-03-09 02:08, Dan Stromberg wrote:
> > > OK, I know that Unicode data is stored in an encoding on disk.
> > > But how is it stored in RAM?
> > > I realize I shouldn't write code that depends on any relevant
> > > implementation details, but knowing some of the more common
> > > implementation options would probably help build an intuition for
> > > what's going on internally.
> > > I've heard that characters are no longer all c bytes wide internally,
> > > so is it sometimes utf-8?
> > No.
> >  From Python 3.3, it's an array of 1, 2 or 4 bytes per codepoint.
> > In Python terms:
> > if all(c <= '\xFF' for c in string):
> >      use 1 byte per codepoint
> > elif all(c <= '\xFFFF' for c in string):
> >      use 2 bytes per codepoint
> > else:
> >      use 4 bytes per codepoint


> A very, very nice recursive mathematical absurdity.

As a profoundly astute mathematician
v v n r m a
can be parsed in 42 different ways (5th catalan number)

Which parse did you intend?


-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How is unicode implemented behind the scenes?

Reply via email to