New submission from Ammar Askar:

So currently as far as string concatenation goes. ceval has this nice little 
branch it can take if both operators are unicode types. However, since this 
check is an Exact check, it means that subtypes of unicode end up going through 
the slow code path through: PyNumber_Add -> PyUnicode_Concat.

This patch aims to allow subtypes to take that optimized branch without 
breaking any existing behavior and without any more memory copy calls than 
necessary.

The motivation for this change is that some templating engines 
(Mako/Jinja2/Cheetah) use stuff like MarkupSafe which is implemented with a 
unicode subtype called `Markup`. Concatenating these custom objects (pretty 
common for templating engines) is fairly slow. This change modifies and uses 
the existing cpython code to make it a fair bit faster.

I think the only real "dangerous" change in here is in the 
cast_unicode_subtype_to_base function which uses a trick at the end to prevent 
deallocation of memory. I've made sure to keep it well commented but I'd 
appreciate any feedback on it.

>From what I can tell from running the test suite, all tests pass and there 
>don't seem to be any new reference leaks.

----------
components: Interpreter Core
files: python.diff
keywords: patch
messages: 269849
nosy: ammar2, benjamin.peterson, ezio.melotti, haypo, lemburg, pitrou
priority: normal
severity: normal
status: open
title: Allow subtypes of unicode/str to hit the optimized unicode_concatenate 
block
type: performance
Added file: http://bugs.python.org/file43631/python.diff

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27458>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to