On Sunday, March 9, 2014 2:09:32 PM UTC+5:30, wxjm...@gmail.com wrote: > Le dimanche 9 mars 2014 03:40:28 UTC+1, MRAB a écrit : > > On 2014-03-09 02:08, Dan Stromberg wrote: > > > OK, I know that Unicode data is stored in an encoding on disk. > > > But how is it stored in RAM? > > > I realize I shouldn't write code that depends on any relevant > > > implementation details, but knowing some of the more common > > > implementation options would probably help build an intuition for > > > what's going on internally. > > > I've heard that characters are no longer all c bytes wide internally, > > > so is it sometimes utf-8? > > No. > > From Python 3.3, it's an array of 1, 2 or 4 bytes per codepoint. > > In Python terms: > > if all(c <= '\xFF' for c in string): > > use 1 byte per codepoint > > elif all(c <= '\xFFFF' for c in string): > > use 2 bytes per codepoint > > else: > > use 4 bytes per codepoint
> A very, very nice recursive mathematical absurdity. As a profoundly astute mathematician v v n r m a can be parsed in 42 different ways (5th catalan number) Which parse did you intend? -- https://mail.python.org/mailman/listinfo/python-list