On 02/01/2014 18:37, Terry Reedy wrote:
On 1/2/2014 12:36 PM, Robin Becker wrote:

I just spent a large amount of effort porting reportlab to a version
which works with both python2.7 and python3.3. I have a large number of
functions etc which handle the conversions that differ between the two
pythons.

I am imagine that this was not fun.

indeed :)

For fairly sensible reasons we changed the internal default to use
unicode rather than bytes.

Do you mean 'from __future__ import unicode_literals'?

No, previously we had default of utf8 encoded strings in the lower levels of the code and we accepted either unicode or utf8 string literals as inputs to text functions. As part of the port process we made the decision to change from default utf8 str (bytes) to default unicode.

Am I correct in thinking that this change increases the capabilities of
reportlab? For instance, easily producing an article with abstracts in English,
Arabic, Russian, and Chinese?

It's made no real difference to what we are able to produce or accept since utf8 or unicode can encode anything in the input and what can be produced depends on fonts mainly.

 > After doing all that and making the tests
...........
I know some of these tests are fairly variable, but even for simple
things like paragraph parsing 3.3 seems to be slower. Since both use
unicode internally it can't be that can it, or is python 2.7's unicode
faster?

The new unicode implementation in 3.3 is faster for some operations and slower
for others. It is definitely more space efficient, especially compared to a wide
build system. It is definitely less buggy, especially compared to a narrow build
system.

Do your tests use any astral (non-BMP) chars? If so, do they pass on narrow 2.7
builds (like on Windows)?

I'm not sure if we have any non-bmp characters in the tests. Simple CJK etc etc for the most part. I'm fairly certain we don't have any ability to handle composed glyphs (multi-codepoint) etc etc



....
For one thing, indexing and slicing just works on all machines for all unicode
strings. Code for 2.7 and 3.3 either a) does not index or slice, b) does not
work for all text on 2.7 narrow builds, or c) has extra conditional code only
for 2.7.


probably
--
Robin Becker

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to