Re: How is unicode implemented behind the scenes?

Mark Lawrence Sun, 09 Mar 2014 07:56:18 -0700

On 09/03/2014 10:32, Rustom Mody wrote:

On Sunday, March 9, 2014 2:09:32 PM UTC+5:30, wxjm...@gmail.com wrote:

Le dimanche 9 mars 2014 03:40:28 UTC+1, MRAB a écrit :

On 2014-03-09 02:08, Dan Stromberg wrote:

OK, I know that Unicode data is stored in an encoding on disk.
But how is it stored in RAM?
I realize I shouldn't write code that depends on any relevant
implementation details, but knowing some of the more common
implementation options would probably help build an intuition for
what's going on internally.
I've heard that characters are no longer all c bytes wide internally,
so is it sometimes utf-8?

No.
  From Python 3.3, it's an array of 1, 2 or 4 bytes per codepoint.
In Python terms:
if all(c <= '\xFF' for c in string):
      use 1 byte per codepoint
elif all(c <= '\xFFFF' for c in string):
      use 2 bytes per codepoint
else:
      use 4 bytes per codepoint

A very, very nice recursive mathematical absurdity.


As a profoundly astute mathematician
v v n r m a
can be parsed in 42 different ways (5th catalan number)

Which parse did you intend?


Please don't feed this particular troll, it's a complete waste of time.

--

My fellow Pythonistas, ask not what our language can do for you, askwhat you can do for our language.


Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com


--
https://mail.python.org/mailman/listinfo/python-list

Re: How is unicode implemented behind the scenes?

Reply via email to