New submission from Alecsandru Patrascu: Title: Link Time Optimizations support for GCC and CLANG
Hi All, This is Alecsandru from Server Scripting Languages Optimization team at Intel Corporation. I would like to submit a patch that adds support for Link Time Optimization (LTO) when using GCC and CLANG to compile CPython2 and CPython3. LTO is a compiler assisted optimization technique that is performed by the compiler at link time. Combined with Profile Guided Optimization (PGO), enabled when running "make profile-opt", and running the Grand Unified Python Benchmark (GUPB), a speedup up to 11%, with a few regressions, was observed comparing with PGO only. Compared with a default build, a performance gain as high as 26% was observed from PGO+LTO. In addition, we are also seeing 2% boost in throughput rate from our OpenStack Swift setup comparing with PGO only. Our GUPB performance evaluation was conducted on Intel SkyLake/Broadwell systems running CentOS/Ubuntu, with CLANG/LLVM and GCC 4.*/5.*. Our OpenStack Swift performance was done on various systems consisting of XEON and Avoton processors. Steps: ====== 1. Get the CPython source codes hg clone https://hg.python.org/cpython cpython cd cpython hg update 2.7 (for CPython2) 2. Build the binary a) Default: ./configure make b) PGO: ./configure make profile-opt c) PGO+LTO: Copy the attached patch files hg import --no-commit lto-cpython3-v01.patch (for CPython3) hg import --no-commit lto-cpython2-v01.patch (for CPython2) ./configure make profile-opt Hardware and OS Configuration ============================= Hardware: Intel XEON (Broadwell-DE) 8 Cores BIOS settings: Intel Turbo Boost Technology: false Hyper-Threading: false OS: Ubuntu 14.04.3 LTS Server OS configuration: Address Space Layout Randomization (ASLR) disabled to reduce run to run variation by echo 0 > /proc/sys/kernel/randomize_va_space CPU frequency set fixed at 2.6GHz GCC version: GCC version 4.9.2 Benchmark: Grand Unified Python Benchmark from https://hg.python.org/benchmarks/ Measurements and Results ======================== A. Repository: GUPB Benchmark: hg id : 2979f5ce6a0c tip hg --debug id -i : 2979f5ce6a0cee994d5485401945d8457bb0afac CPython3: hg id : 21a28f6de358 hg id -r 'ancestors(.) and tag()': 374f501f4567 (3.5) v3.5.0 hg --debug id -i : 21a28f6de3582833652c958b8fd6ae8448b61c7c CPython2: hg id : a37ea1d56e98 (2.7) hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10 hg --debug id -i : a37ea1d56e98eb158750d3e495a5cf524e8c3980 B. Results: CPython2 and CPython3 sample results, measured on a Broadwell platform, can be viewed in Table 1 and 2. On the first column (Benchmark) you can see the benchmark name, on the second (%D) the speedup compared with the default version and on the third column (%PGO) the speedup compared with just PGO; a higher value is better. Table 1. CPython2 results: Benchmark %D %PGO -------------------------------- raytrace 18 3 chaos 16 5 django_v2 16 6 mako 16 6 pathlib 15 3 simple_logging 15 1 slowpickle 15 5 django 14 4 go 14 4 richards 13 -1 float 12 4 slowunpickle 12 4 etree_process 11 3 fastunpickle 11 6 formatted_logging 11 3 nqueens 11 1 regex_compile 11 3 etree_iterparse 10 4 mako_v2 10 3 telco 10 5 pybench 9 1 hexiom2 9 1 html5lib_warmup 9 3 meteor_contest 9 4 pickle_list 9 5 2to3 8 2 bzr_startup 8 2 chameleon 8 0 etree_generate 8 2 regex_v8 8 3 silent_logging 8 1 fannkuch 7 1 html5lib 7 3 json_load 7 -5 tornado_http 7 3 call_method_slots 6 3 json_dump_v2 6 -4 spambayes 6 2 unpickle_list 6 0 etree_parse 5 3 fastpickle 5 4 rietveld 5 1 call_method 4 -1 normal_startup 4 2 startup_nosite 4 2 slowspitfire 3 0 ssbench 4 2 call_method_unknown 1 -6 json_dump 1 -4 nbody 1 1 pidigits 1 -10 pickle_dict 0 -1 regex_effbot 0 -2 spectral_norm 0 -3 call_simple -3 -3 unpack_sequence -6 -2 Table 2. CPython3 results: Benchmark %D %PGO -------------------------------- formatted_logging 26 11 raytrace 24 8 simple_logging 24 6 richards 22 3 chaos 21 7 go 21 11 hexiom2 21 8 nbody 21 9 etree_generate 19 5 etree_process 19 5 call_method_slots 18 3 fastunpickle 18 0 pathlib 18 5 regex_compile 18 8 float 17 8 nqueens 17 7 call_method 16 3 etree_iterparse 16 9 json_dump 16 -4 json_load 16 5 silent_logging 15 8 2to3 14 5 fannkuch 14 8 call_simple 12 0 meteor_contest 12 7 call_method_unknown 11 -1 spectral_norm 11 4 json_dump_v2 10 3 telco 10 5 fastpickle 9 -4 etree_parse 8 1 normal_startup 8 3 startup_nosite 7 3 unpack_sequence 7 3 regex_v8 6 4 unpickle_list 5 3 pickle_list 1 -10 pidigits 1 -11 regex_effbot -2 2 pickle_dict -3 -10 Thank you, Alecsandru ---------- components: Build messages: 255140 nosy: alecsandru.patrascu priority: normal severity: normal status: open title: Link Time Optimizations support for GCC and CLANG type: performance versions: Python 2.7, Python 3.5, Python 3.6 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue25702> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com