[Python-Dev] Hash computation enhancement for {buffer, string, unicode}object

2015-09-14 Thread Patrascu, Alecsandru
Hi All,

This is Alecsandru from Server Scripting Languages Optimization team at Intel 
Corporation.

I would like to submit a patch that improves the performance of the hash 
computation code on stringobject, bufferobject and unicodeobject. As can be 
seen from the attached sample performance results from the Grand Unified Python 
Benchmark, speedups up to 40% were observed. Furthermore, we see a 5-7% 
performance on OpenStack/Swift, where most of the code is in Python 2.7.

Attached is the patch that modifies Object/stringobject.c, 
Object/bufferobject.c and Object/unicodeobject.c files. We built and tested 
this patch for Python 2.7 on our Linux machines (CentOS 7/Ubuntu Server 14.04, 
Intel Xeon Haswell/Broadwell with 18/8 cores). 

I've also opened an issue on the bug tracker: http://bugs.python.org/issue25106

Steps to apply the patch:
1.  hg clone https://hg.python.org/cpython cpython 
2.  cd cpython 
3.  hg update 2.7
4.  Copy hash8.patch to the current directory 
5.  hg import --no-commit hash8.patch
6.  ./configure 
7.  make



In the following, please find our sample performance results measured on a XEON 
Haswell machine.  

Hardware (HW):  Intel XEON (Haswell) 18 Cores

BIOS settings:  Intel Turbo Boost Technology: false
Hyper-Threading: false

Operating System:   Ubuntu 14.04.3 LTS trusty

OS configuration:   CPU freq set at fixed: 2.0GHz by
echo 200 > 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
echo 200 > 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
Address Space Layout Randomization (ASLR) disabled (to 
reduce run to run variation) by
echo 0 > /proc/sys/kernel/randomize_va_space

GCC version:gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)

Benchmark:  Grand Unified Python Benchmark (GUPB)
GUPB Source: https://hg.python.org/benchmarks/  
  

Python2.7 results:
Python source: hg clone https://hg.python.org/cpython cpython
Python Source: hg update 2.7

Benchmarks  Speedup(%)
unpack_sequence 40.32733766
chaos   24.84002537
chameleon   23.01392651
silent_logging  22.27202911
django  20.83842317
etree_process   20.46968294
nqueens 20.34234985
pathlib 19.63445919
pidigits19.34722148
etree_generate  19.25836634
pybench 19.06895825
django_v2   18.06073108
etree_iterparse 17.3797149
fannkuch17.08120879
pickle_list 16.60363602
raytrace16.0316265
slowpickle  15.86611184
pickle_dict 15.30447114
call_simple 14.42909032
richards14.2949594
simple_logging  13.6522626
etree_parse 13.38113097
json_dump_v212.2655
float   11.88164311
mako11.20606516
spectral_norm   11.04356684
hg_startup  10.57686164
mako_v2 10.37912648
slowunpickle10.24030714
go  10.03567319
meteor_contest  9.956231435
normal_startup  9.607401586
formatted_logging   9.601244811
html5lib9.082603748
2to38.741557816
html5lib_warmup 8.268150981
nbody   7.507012306
regex_compile   7.153922724
bzr_startup 7.140244739
telco   6.869411927
slowspitfire5.746323922
tornado_http5.24360121
rietveld3.865704876
regex_v83.777622219
hexiom2 3.586305282
json_dump   3.477551682
spambayes   3.183991854
fastunpickle2.971645347
fastpickle  0.673086656
regex_effbot0.127946837
json_load   0.023727176

Thank you,
Alecsandru


hash8-v01.patch
Description: hash8-v01.patch
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] semantics of subclassing things from itertools

2015-09-14 Thread Serhiy Storchaka

On 10.09.15 15:50, Maciej Fijalkowski wrote:

On Thu, Sep 10, 2015 at 10:26 AM, Serhiy Storchaka  wrote:

There is another reason why itertools iterators can't be implemented as
simple generator functions. All iterators are pickleable in 3.x.


maybe the documentation should reflect that? (note that generators are
pickleable on pypy anyway)


This pickling is not compatible with CPython. So even if itertools 
classes would not subclassable, you would need to implement itertools 
iterators as classes.


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] What happens of the Python 3.4 branch?

2015-09-14 Thread Victor Stinner
Hi,

Python 3.5.0 was released. What happens to the 3.4 branch in
Mercurial? Does it still accept bugfixes, or is it only for security
fixes now?

Victor
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What happens of the Python 3.4 branch?

2015-09-14 Thread Larry Hastings



On 09/14/2015 09:29 AM, Victor Stinner wrote:

Python 3.5.0 was released. What happens to the 3.4 branch in
Mercurial? Does it still accept bugfixes, or is it only for security
fixes now?


Nothing has been announced or decided.  As release manager I suppose I 
get some say.  Here, I'll propose something:


   Python 3.4.4 rc1 should be released on Sunday October 4th.
   Python 3.4.4 final should be released on Sunday October 13th.
   After the tag of 3.4.4, Python 3.4 should enter security-fixes-only
   mode, and any future releases (3.4.5+) will be source code only.

How's that?


//arry/
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What happens of the Python 3.4 branch?

2015-09-14 Thread Mark Lawrence

On 14/09/2015 10:49, Larry Hastings wrote:



On 09/14/2015 09:29 AM, Victor Stinner wrote:

Python 3.5.0 was released. What happens to the 3.4 branch in
Mercurial? Does it still accept bugfixes, or is it only for security
fixes now?


Nothing has been announced or decided.  As release manager I suppose I
get some say.  Here, I'll propose something:

Python 3.4.4 rc1 should be released on Sunday October 4th.
Python 3.4.4 final should be released on Sunday October 13th.
After the tag of 3.4.4, Python 3.4 should enter security-fixes-only
mode, and any future releases (3.4.5+) will be source code only.

How's that?

//arry/


Sorry but Sunday October 13th doesn't suit me, how about Sunday October 
11th or Sunday October 18th?


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What happens of the Python 3.4 branch?

2015-09-14 Thread Larry Hastings



On 09/14/2015 11:37 AM, Mark Lawrence wrote:
Sorry but Sunday October 13th doesn't suit me, how about Sunday 
October 11th or Sunday October 18th?




Fair enough.  Sunday October 11th, 2015.

On second thought it's probably best to not wait until 2019,


//arry/
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: In-line the append operations inside deque_inplace_repeat().

2015-09-14 Thread Brett Cannon
Would it be worth adding a comment that the block of code is an inlined
copy of deque_append()? Or maybe even turn the append() function into a
macro so you minimize code duplication?

On Sat, 12 Sep 2015 at 08:00 raymond.hettinger 
wrote:

> https://hg.python.org/cpython/rev/cb96ffe6ff10
> changeset:   97943:cb96ffe6ff10
> parent:  97941:b8f3a01937be
> user:Raymond Hettinger 
> date:Sat Sep 12 11:00:20 2015 -0400
> summary:
>   In-line the append operations inside deque_inplace_repeat().
>
> files:
>   Modules/_collectionsmodule.c |  22 ++
>   1 files changed, 18 insertions(+), 4 deletions(-)
>
>
> diff --git a/Modules/_collectionsmodule.c b/Modules/_collectionsmodule.c
> --- a/Modules/_collectionsmodule.c
> +++ b/Modules/_collectionsmodule.c
> @@ -567,12 +567,26 @@
>  if (n > MAX_DEQUE_LEN)
>  return PyErr_NoMemory();
>
> +deque->state++;
>  for (i = 0 ; i < n-1 ; i++) {
> -rv = deque_append(deque, item);
> -if (rv == NULL)
> -return NULL;
> -Py_DECREF(rv);
> +if (deque->rightindex == BLOCKLEN - 1) {
> +block *b = newblock(Py_SIZE(deque) + i);
> +if (b == NULL) {
> +Py_SIZE(deque) += i;
> +return NULL;
> +}
> +b->leftlink = deque->rightblock;
> +CHECK_END(deque->rightblock->rightlink);
> +deque->rightblock->rightlink = b;
> +deque->rightblock = b;
> +MARK_END(b->rightlink);
> +deque->rightindex = -1;
> +}
> +deque->rightindex++;
> +Py_INCREF(item);
> +deque->rightblock->data[deque->rightindex] = item;
>  }
> +Py_SIZE(deque) += i;
>  Py_INCREF(deque);
>  return (PyObject *)deque;
>  }
>
> --
> Repository URL: https://hg.python.org/cpython
> ___
> Python-checkins mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-checkins
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Numpy-discussion] The process I intend to follow for any proposed changes to NumPy

2015-09-14 Thread Chris Barker
Travis,

I'm sure you appreciate that this might all look a bit scary, given the
recent discussion about numpy governance.

But it's an open-source project, and I, at least, fully understand that
going through a big process is NOT the way to get a new idea tried out and
implemented. So I think think this is a great development -- I know I want
to see something like this dtype work done.

So, as someone who has been around this community for a long time, and
dependent on Numeric, numarray, and numpy over the years, this looks like a
great development.

And, in fact, with the new governance effort -- I think less scary --
people can go off and work on a branch or fork, do good stuff, and we, as a
community, can be assured that API (or even ABI) changes won't be thrust
upon us unawares :-)

As for the technical details -- I get a bit lost, not fully understanding
the current dtype system either, but do your ideas take us in the direction
of having dtypes independent of the container and ufunc machinery -- and
thus easier to create new dtypes (even in Python?) 'cause that would be
great.

I hope you find the partner you're looking for -- that's a challenge!

-Chris




-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

[email protected]
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: In-line the append operations inside deque_inplace_repeat().

2015-09-14 Thread Raymond Hettinger

> On Sep 14, 2015, at 12:49 PM, Brett Cannon  wrote:
> 
> Would it be worth adding a comment that the block of code is an inlined copy 
> of deque_append()?
> Or maybe even turn the append() function into a macro so you minimize code 
> duplication?

I don't think either would be helpful.  The point of the inlining was to let 
the code evolve independently from deque_append().   

Once separated from the mother ship, the code in deque_inline_repeat() could 
now shed the unnecessary work.  The state variable is updated once.  The 
updates within a single block are now in the own inner loop. The deque size is 
updated outside of that loop, etc.   In other words, they are no longer the 
same code.

The original append-in-a-loop version was already being in-lined by the 
compiler but was doing way too much work.  For each item written in the 
original, there were 7 memory reads, 5 writes, 6 predictable 
compare-and-branches, and 5 add/sub operations.  In the current form, there are 
0 reads, 1 writes, 2 predictable compare-and-branches, and 3 add/sub operations.

FWIW, my work flow is that periodically I expand the code with new features 
(the upcoming work is to add slicing support 
http://bugs.python.org/issue17394), then once it is correct and tested, I make 
a series optimization passes (such as the work I just described above).  After 
that, I come along and factor-out common code, usually with clean, in-lineable 
functions rather than macros (such as the recent check-in replacing redundant 
code in deque_repeat with a call to the common code in deque_inplace_repeat).

My schedule lately hasn't given me any big blocks of time to work with, so I do 
the steps piecemeal as I get snippets of development time.


Raymond


P.S. For those who are interested, here is the before and after:

 before -
L1152:
movq__Py_NoneStruct@GOTPCREL(%rip), %rdi
cmpq$0, (%rdi)   <
je  L1257
L1159:
addq$1, %r13
cmpq%r14, %r13
je  L1141
movq16(%rbx), %rsi   <
L1142:
movq48(%rbx), %rdx   <
addq$1, 56(%rbx) <>
cmpq$63, %rdx
je  L1143
movq32(%rbx), %rax   <
addq$1, %rdx
L1144:
addq$1, 0(%rbp)  <>
leaq1(%rsi), %rcx
movq%rdx, 48(%rbx)>
movq%rcx, 16(%rbx)>
movq%rbp, 8(%rax,%rdx,8)  >
movq64(%rbx), %rax   <
cmpq%rax, %rcx
jle L1152
cmpq$-1, %rax
je  L1152


 after 
L777:
cmpq$63, %rdx
je  L816
L779:
addq$1, %rdx
movq%rbp, 16(%rsi,%rbx,8)<
addq$1, %rbx
leaq(%rdx,%r9), %rcx
subq%r8, %rcx
cmpq%r12, %rbx
jl  L777

# outside the inner-loop
movq%rdx, 48(%r13)  
movq%rcx, 0(%rbp)
cmpq%r12, %rbx
jl  L780
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: In-line the append operations inside deque_inplace_repeat().

2015-09-14 Thread Brett Cannon
On Mon, 14 Sep 2015 at 15:37 Raymond Hettinger 
wrote:

>
> > On Sep 14, 2015, at 12:49 PM, Brett Cannon  wrote:
> >
> > Would it be worth adding a comment that the block of code is an inlined
> copy of deque_append()?
> > Or maybe even turn the append() function into a macro so you minimize
> code duplication?
>
> I don't think either would be helpful.  The point of the inlining was to
> let the code evolve independently from deque_append().
>

OK, commit message just didn't point that out as the reason for the
inlining (I guess in the future call it a fork of the code to know it is
meant to evolve independently?).

-Brett


>
> Once separated from the mother ship, the code in deque_inline_repeat()
> could now shed the unnecessary work.  The state variable is updated once.
> The updates within a single block are now in the own inner loop. The deque
> size is updated outside of that loop, etc.   In other words, they are no
> longer the same code.
>
> The original append-in-a-loop version was already being in-lined by the
> compiler but was doing way too much work.  For each item written in the
> original, there were 7 memory reads, 5 writes, 6 predictable
> compare-and-branches, and 5 add/sub operations.  In the current form, there
> are 0 reads, 1 writes, 2 predictable compare-and-branches, and 3 add/sub
> operations.
>
> FWIW, my work flow is that periodically I expand the code with new
> features (the upcoming work is to add slicing support
> http://bugs.python.org/issue17394), then once it is correct and tested, I
> make a series optimization passes (such as the work I just described
> above).  After that, I come along and factor-out common code, usually with
> clean, in-lineable functions rather than macros (such as the recent
> check-in replacing redundant code in deque_repeat with a call to the common
> code in deque_inplace_repeat).
>
> My schedule lately hasn't given me any big blocks of time to work with, so
> I do the steps piecemeal as I get snippets of development time.
>
>
> Raymond
>
>
> P.S. For those who are interested, here is the before and after:
>
>  before -
> L1152:
> movq__Py_NoneStruct@GOTPCREL(%rip), %rdi
> cmpq$0, (%rdi)   <
> je  L1257
> L1159:
> addq$1, %r13
> cmpq%r14, %r13
> je  L1141
> movq16(%rbx), %rsi   <
> L1142:
> movq48(%rbx), %rdx   <
> addq$1, 56(%rbx) <>
> cmpq$63, %rdx
> je  L1143
> movq32(%rbx), %rax   <
> addq$1, %rdx
> L1144:
> addq$1, 0(%rbp)  <>
> leaq1(%rsi), %rcx
> movq%rdx, 48(%rbx)>
> movq%rcx, 16(%rbx)>
> movq%rbp, 8(%rax,%rdx,8)  >
> movq64(%rbx), %rax   <
> cmpq%rax, %rcx
> jle L1152
> cmpq$-1, %rax
> je  L1152
>
>
>  after 
> L777:
> cmpq$63, %rdx
> je  L816
> L779:
> addq$1, %rdx
> movq%rbp, 16(%rsi,%rbx,8)<
> addq$1, %rbx
> leaq(%rdx,%r9), %rcx
> subq%r8, %rcx
> cmpq%r12, %rbx
> jl  L777
>
> # outside the inner-loop
> movq%rdx, 48(%r13)
> movq%rcx, 0(%rbp)
> cmpq%r12, %rbx
> jl  L780
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com