On Thu, Mar 5, 2015, at 09:06, Steven D'Aprano wrote: > I mostly agree with Chris. Supporting *just* the BMP is non-trivial in > UTF-8 > and UTF-32, since that goes against the grain of the system. You would > have > to program in artificial restrictions that otherwise don't exist.
UTF-8 is already restricted from representing values above 0x10FFFF, whereas UTF-8 can "naturally" represent values up to 0x1FFFFF in four bytes, up to 0x3FFFFFF in five bytes, and 0x7FFFFFFF in six bytes. If anything, the BMP represents a natural boundary, since it coincides with values that can be represented in three bytes. Likewise, UTF-32 can obviously represent values up to 0xFFFFFFFF. You're programming in artificial restrictions either way, it's just a question of what those restrictions are. -- https://mail.python.org/mailman/listinfo/python-list