Ping.

On 08/03/2015 11:40 AM, Mikhail Maltsev wrote:
> On Jul 26, 2015, at 11:50 AM, Andi Kleen <a...@firstfloor.org> wrote:
>> I've been compiling gcc with tcmalloc to do a similar speedup. It would be
>> interesting to compare that to your patch.
> I repeated the test with TCMalloc and jemalloc. TCMalloc shows nice results,
> though it required some tweaks: this allocator has a threshold block size 
> equal
> to 32 KB, larger blocks are allocated from global heap, rather than thread 
> cache
> (and this operation is expensive), so the original patch shows worse 
> performance
> when used with TCMalloc. In order to fix this, I reduced the block size to 8 
> KB.
> Here there are 5 columns for each value: pristine version, pristine version +
> TCMalloc (and the difference in parenthesis), and patched version with 
> TCMalloc
> (difference is relative to pristine version). Likewise, for memory usage.
> 
> 400.perlbench        26.86  26.17 (  -2.57%)  26.17 (  -2.57%) user
>                       0.56   0.64 ( +14.29%)   0.61 (  +8.93%) sys
>                      27.45  26.84 (  -2.22%)  26.81 (  -2.33%) real
> 401.bzip2             2.53    2.5 (  -1.19%)   2.48 (  -1.98%) user
>                       0.07   0.09 ( +28.57%)    0.1 ( +42.86%) sys
>                       2.61    2.6 (  -0.38%)   2.59 (  -0.77%) real
> 403.gcc              73.59  72.62 (  -1.32%)  71.72 (  -2.54%) user
>                       1.59   1.88 ( +18.24%)   1.88 ( +18.24%) sys
>                      75.27  74.58 (  -0.92%)  73.67 (  -2.13%) real
> 429.mcf                0.4   0.41 (  +2.50%)    0.4 (  +0.00%) user
>                       0.03   0.05 ( +66.67%)   0.05 ( +66.67%) sys
>                       0.44   0.47 (  +6.82%)   0.47 (  +6.82%) real
> 433.milc              3.22   3.24 (  +0.62%)   3.25 (  +0.93%) user
>                       0.22   0.32 ( +45.45%)    0.3 ( +36.36%) sys
>                       3.48   3.59 (  +3.16%)   3.59 (  +3.16%) real
> 444.namd              7.54   7.41 (  -1.72%)   7.37 (  -2.25%) user
>                        0.1   0.15 ( +50.00%)   0.15 ( +50.00%) sys
>                       7.66   7.58 (  -1.04%)   7.54 (  -1.57%) real
> 445.gobmk            20.24  19.59 (  -3.21%)   19.6 (  -3.16%) user
>                       0.52   0.67 ( +28.85%)   0.59 ( +13.46%) sys
>                       20.8  20.29 (  -2.45%)  20.23 (  -2.74%) real
> 450.soplex           19.08  18.47 (  -3.20%)  18.51 (  -2.99%) user
>                       0.87   1.11 ( +27.59%)   1.06 ( +21.84%) sys
>                      19.99  19.62 (  -1.85%)   19.6 (  -1.95%) real
> 453.povray           42.27  41.42 (  -2.01%)  41.32 (  -2.25%) user
>                       2.71   3.11 ( +14.76%)   3.09 ( +14.02%) sys
>                      45.04  44.58 (  -1.02%)  44.47 (  -1.27%) real
> 456.hmmer             7.27   7.22 (  -0.69%)   7.15 (  -1.65%) user
>                       0.31   0.36 ( +16.13%)   0.39 ( +25.81%) sys
>                       7.61   7.61 (  +0.00%)   7.57 (  -0.53%) real
> 458.sjeng             3.22   3.14 (  -2.48%)   3.15 (  -2.17%) user
>                       0.09   0.16 ( +77.78%)   0.14 ( +55.56%) sys
>                       3.32   3.32 (  +0.00%)    3.3 (  -0.60%) real
> 462.libquantum        0.86   0.87 (  +1.16%)   0.85 (  -1.16%) user
>                       0.05   0.08 ( +60.00%)   0.08 ( +60.00%) sys
>                       0.92   0.96 (  +4.35%)   0.94 (  +2.17%) real
> 464.h264ref          27.62  27.27 (  -1.27%)  27.16 (  -1.67%) user
>                       0.63   0.73 ( +15.87%)   0.75 ( +19.05%) sys
>                      28.28  28.03 (  -0.88%)  27.95 (  -1.17%) real
> 470.lbm               0.27   0.27 (  +0.00%)   0.27 (  +0.00%) user
>                       0.01   0.01 (  +0.00%)   0.01 (  +0.00%) sys
>                       0.29   0.29 (  +0.00%)   0.29 (  +0.00%) real
> 471.omnetpp          28.29  27.63 (  -2.33%)  27.54 (  -2.65%) user
>                        1.5   1.57 (  +4.67%)   1.62 (  +8.00%) sys
>                      29.84  29.25 (  -1.98%)  29.21 (  -2.11%) real
> 473.astar             1.14   1.12 (  -1.75%)   1.11 (  -2.63%) user
>                       0.05   0.07 ( +40.00%)   0.09 ( +80.00%) sys
>                       1.21   1.21 (  +0.00%)    1.2 (  -0.83%) real
> 482.sphinx3           4.65   4.57 (  -1.72%)   4.59 (  -1.29%) user
>                        0.2    0.3 ( +50.00%)   0.26 ( +30.00%) sys
>                       4.88   4.89 (  +0.20%)   4.88 (  +0.00%) real
> 483.xalancbmk        284.5  276.4 (  -2.85%) 276.48 (  -2.82%) user
>                      20.29  23.03 ( +13.50%)  22.82 ( +12.47%) sys
>                     305.19 299.79 (  -1.77%) 299.67 (  -1.81%) real
> 
> 400.perlbench     102308kB    123004kB  (  +20696kB)    116104kB  (  +13796kB)
> 401.bzip2          74628kB     86936kB  (  +12308kB)     84316kB  (   +9688kB)
> 403.gcc           190284kB    218180kB  (  +27896kB)    212480kB  (  +22196kB)
> 429.mcf            19804kB     24464kB  (   +4660kB)     24320kB  (   +4516kB)
> 433.milc           36940kB     45308kB  (   +8368kB)     44652kB  (   +7712kB)
> 444.namd          183548kB    193856kB  (  +10308kB)    192632kB  (   +9084kB)
> 445.gobmk          73724kB     78792kB  (   +5068kB)     79192kB  (   +5468kB)
> 450.soplex         62076kB     67596kB  (   +5520kB)     66856kB  (   +4780kB)
> 453.povray        180620kB    208480kB  (  +27860kB)    207576kB  (  +26956kB)
> 456.hmmer          39544kB     47380kB  (   +7836kB)     46776kB  (   +7232kB)
> 458.sjeng          40144kB     48652kB  (   +8508kB)     47608kB  (   +7464kB)
> 462.libquantum     23464kB     28576kB  (   +5112kB)     28260kB  (   +4796kB)
> 464.h264ref       708760kB    738400kB  (  +29640kB)    734224kB  (  +25464kB)
> 470.lbm            26552kB     31684kB  (   +5132kB)     31348kB  (   +4796kB)
> 471.omnetpp       152000kB    172924kB  (  +20924kB)    167204kB  (  +15204kB)
> 473.astar          27036kB     31472kB  (   +4436kB)     31380kB  (   +4344kB)
> 482.sphinx3        33100kB     40812kB  (   +7712kB)     39496kB  (   +6396kB)
> 483.xalancbmk     368844kB    393292kB  (  +24448kB)    393032kB  (  +24188kB)
> 
> 
> jemalloc causes regression (and that is rather surprising, because my previous
> tests showed the opposite result, but those tests had very small workload - in
> fact, a single file).
> 
> On 07/27/2015 12:13 PM, Richard Biener wrote:
>>>> On Jul 26, 2015, at 11:50 AM, Andi Kleen <a...@firstfloor.org> wrote:
>>>> Another useful optimization is to adjust the allocation size to be >=
>>>> 2MB. Then modern Linux kernels often can give you a large page,
>>>> which cuts down TLB overhead. I did similar changes some time
>>>> ago for the garbage collector.
>>>
>>> Unless you are running with 64k pages which I do all the time on my armv8 
>>> system.
>>
>> This can be a host configurable value of course.
> Yes, I actually mentioned that among possible enhancements. I think that code
> from ggc-page.c can be reused (it already implements querying page size from 
> OS).
> 
>> But first of all (without looking at the patch but just reading the
>> description) this
>> sounds like a good idea.  Maybe still allow pools to use their own backing if
>> the object size is larger than the block size of the caching pool?
> Yes, I though about it, but I hesitated, whether this should be implemented in
> advance. I attached the updated patch.
> 

-- 
Regards,
    Mikhail Maltsev

Reply via email to