Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-23 Thread Patrascu, Alecsandru
I removed the zip file and uploaded the patches individually. 

Alecsandru

From: Brett Cannon [mailto:[email protected]] 
Sent: Sunday, August 23, 2015 4:47 AM
To: Patrascu, Alecsandru; [email protected]
Subject: Re: [Python-Dev] Profile Guided Optimization active by-default


On Sat, 22 Aug 2015 at 11:10 Patrascu, Alecsandru 
 wrote:
I'm sorry, I forgot to mention this, I already opened an issue and the patches 
are uploaded [1].

[1] http://bugs.python.org/issue24915

Great, thanks Alecandru. Do please follow Stefan's comment, though, and upload 
the patch files directly and not as a zip file. That way we can use our code 
review tool to do a proper review of the patches.

-Brett
 


From: Brett Cannon [mailto:[email protected]]
Sent: Saturday, August 22, 2015 9:00 PM
To: Patrascu, Alecsandru; [email protected]
Subject: Re: [Python-Dev] Profile Guided Optimization active by-default

I just realized I didn't see anyone say it, but please upload the patches to 
bugs.Python.org for easier tracking and reviewing.

On Sat, Aug 22, 2015, 08:01 Patrascu, Alecsandru 
 wrote:
Hi All,

This is Alecsandru from Server Scripting Languages Optimization team at Intel 
Corporation.

I would like to submit a request to turn-on Profile Guided Optimization or PGO 
as the default build option for Python (both 2.7 and 3.6), given its 
performance benefits on a wide variety of workloads and hardware.  For 
instance, as shown from attached sample performance results from the Grand 
Unified Python Benchmark, >20% speed up was observed.  In addition, we are 
seeing 2-9% performance boost from OpenStack/Swift where more than 60% of the 
codes are in Python 2.7. Our analysis indicates the performance gain was mainly 
due to reduction of icache misses and CPU front-end stalls.

Attached is the Makefile patches that modify the all build target and adds a 
new one called "disable-profile-opt". We built and tested this patch for Python 
2.7 and 3.6 on our Linux machines (CentOS 7/Ubuntu Server 14.04, Intel Xeon 
Haswell/Broadwell with 18/8 cores).  We use "regrtest" suite for training as it 
provides the best performance improvement.  Some of the test programs in the 
suite may fail which leads to build fail.  One solution is to disable the 
specific failed test using the "-x " flag (as shown in the patch)

Steps to apply the patch:
1.  hg clone https://hg.python.org/cpython cpython
2.  cd cpython
3.  hg update 2.7 (needed for 2.7 only)
4.  Copy *.patch to the current directory
5.  patch < python2.7-pgo.patch (or patch < python3.6-pgo.patch)
6.  ./configure
7.  make

To disable PGO
7b. make disable-profile-opt

In the following, please find our sample performance results from latest XEON 
machine, XEON Broadwell EP.
Hardware (HW):      Intel XEON (Broadwell) 8 Cores

BIOS settings:      Intel Turbo Boost Technology: false
                    Hyper-Threading: false

Operating System:   Ubuntu 14.04.3 LTS trusty

OS configuration:   CPU freq set at fixed: 2.6GHz by
                        echo 260 > 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
                        echo 260 > 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
                    Address Space Layout Randomization (ASLR) disabled (to 
reduce run to run variation) by
                        echo 0 > /proc/sys/kernel/randomize_va_space

GCC version:        gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)

Benchmark:          Grand Unified Python Benchmark (GUPB)
                    GUPB Source: https://hg.python.org/benchmarks/

Python2.7 results:
    Python source: hg clone https://hg.python.org/cpython cpython
    Python Source: hg update 2.7
    hg id: 0511b1165bb6 (2.7)
    hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10
    hg --debug id -i: 0511b1165bb6cf40ada0768a7efc7ba89316f6a5

        Benchmarks          Speedup(%)
        simple_logging      20
        raytrace            20
        silent_logging      19
        richards            19
        chaos               16
        formatted_logging   16
        json_dump           15
        hexiom2             13
        pidigits            12
        slowunpickle        12
        django_v2           12
        unpack_sequence     11
        float               11
        mako                11
        slowpickle          11
        fastpickle          11
        django              11
        go                  10
        json_dump_v2        10
        pathlib             10
        regex_compile       10
        pybench             9.9
        etree_process       9
        regex_v8            8
        bzr_startup         8
        2to3                8
        slowspitfire        8
        telco               8
        pickle_list         8
        fannkuch            8
        etree_iterparse     8
        nqueens             8
        mako_v2             8
        etree_generate      8
        call_method_slots   7
        html5lib_warmup     7
        html5lib      

Re: [Python-Dev] tp_finalize vs tp_del sematics

2015-08-23 Thread Armin Rigo
Hi Valentine,

On 19 August 2015 at 09:53, Valentine Sinitsyn
 wrote:
> why it wasn't possible to
> implement proposed CI disposal scheme on top of tp_del?

I'm replying here as best as I understand the situation, which might
be incomplete or wrong.

From the point of view of someone writing a C extension module, both
tp_del and tp_finalize are called with the same guarantee that the
object is still valid at that point.  The difference is only that the
presence of tp_del prevents the object from being collected at all if
it is part of a cycle.  Maybe the same could have been done without
duplicating the function pointer (tp_del + tp_finalize) with a
Py_TPFLAGS_DEL_EVEN_IN_A_CYCLE.


A bientôt,

Armin.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com