Serhiy Storchaka <storch...@gmail.com> added the comment:

> Here is a new patch using _PyUnicodeWriter directly in longobject.c.

It may be worth to do it in a separate issue?

decimal digits) is 17% faster with my patch version 2 compared to tip,
and 38% faster compared to Python 3.3 before my optimizations on str%
tuples or str.format(). Creating a temporary PyUnicode is not cheap, at
least for short strings.

Here is my benchmark script (attached) and the results:

Python 3.3 (vanilla):

1.65    str(12345)
2.6     '{}'.format(12345)
2.69    'A{}'.format(12345)
3.16    '\x80{}'.format(12345)
3.23    '\u0100{}'.format(12345)
3.32    '\U00010000{}'.format(12345)
4.6     '{:-10}'.format(12345)
4.89    'A{:-10}'.format(12345)
5.53    '\x80{:-10}'.format(12345)
5.71    '\u0100{:-10}'.format(12345)
5.63    '\U00010000{:-10}'.format(12345)
4.6     '{:,}'.format(12345)
4.71    'A{:,}'.format(12345)
5.28    '\x80{:,}'.format(12345)
5.65    '\u0100{:,}'.format(12345)
5.59    '\U00010000{:,}'.format(12345)

Python 3.3 + format_writer.patch:

1.72    str(12345)
2.74    '{}'.format(12345)
2.99    'A{}'.format(12345)
3.4     '\x80{}'.format(12345)
3.52    '\u0100{}'.format(12345)
3.51    '\U00010000{}'.format(12345)
4.24    '{:-10}'.format(12345)
4.6     'A{:-10}'.format(12345)
5.16    '\x80{:-10}'.format(12345)
6.87    '\u0100{:-10}'.format(12345)
6.83    '\U00010000{:-10}'.format(12345)
4.12    '{:,}'.format(12345)
4.6     'A{:,}'.format(12345)
5.09    '\x80{:,}'.format(12345)
6.63    '\u0100{:,}'.format(12345)
6.42    '\U00010000{:,}'.format(12345)

Python 3.3 + format_writer-2.patch: 

1.91    str(12345)
2.44    '{}'.format(12345)
2.61    'A{}'.format(12345)
3.08    '\x80{}'.format(12345)
3.31    '\u0100{}'.format(12345)
3.13    '\U00010000{}'.format(12345)
4.57    '{:-10}'.format(12345)
4.96    'A{:-10}'.format(12345)
5.52    '\x80{:-10}'.format(12345)
7.01    '\u0100{:-10}'.format(12345)
7.34    '\U00010000{:-10}'.format(12345)
4.42    '{:,}'.format(12345)
4.76    'A{:,}'.format(12345)
5.16    '\x80{:,}'.format(12345)
7.2     '\u0100{:,}'.format(12345)
6.74    '\U00010000{:,}'.format(12345)

As you can see, there is a regress, and sometimes it is not less than
improvement.

----------
Added file: http://bugs.python.org/file25506/issue14744-bench-1.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue14744>
_______________________________________
import timeit

def bench(stmt, msg=None, number=100000):
    if msg is None: msg = stmt
    best = min(timeit.repeat(stmt, number=number))
    print("%.3g\t%s" % (best * 1e6 / number, msg))

bench('str(12345)')
for s in ('', 'A', '\u0080', '\u0100', '\U00010000'):
    bench('%a.format(12345)' % (s + '{}',))
for s in ('', 'A', '\u0080', '\u0100', '\U00010000'):
    bench('%a.format(12345)' % (s + '{:-10}',))
for s in ('', 'A', '\u0080', '\u0100', '\U00010000'):
    bench('%a.format(12345)' % (s + '{:,}',))
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to