Re: [PATCH] gcc parallel make check

Mike Stump Fri, 12 Sep 2014 16:43:57 -0700

On Sep 12, 2014, at 9:32 AM, Jakub Jelinek <ja...@redhat.com> wrote:
> Here is my latest version of the patch.


I did a timing test:

Before:

real    0m57.198s
user    1m24.736s
sys     0m19.816s

after:

real    0m28.224s
user    1m27.823s
sys     0m22.374s

This is a -j70 run on a 64 core power7 of check-objc, I picked an obscure test 
case that I had no reason to believe was other than ignored and certainly not 
engineered for and kinda small to ensure the overhead would penalize it…  
50.66% faster.  There is still room for improvement:

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 7  0      0 99046848 8515072 16748672    0    0     0     0    1    1  0  0 
100  0  0
 7  0      0 99050432 8515072 16748736    0    0     0     0 7501 9022 13  3 84 
 0  0
 7  0      0 99029376 8515072 16749248    0    0     0     0 7320 8777 10  2 88 
 0  0
 7  0      0 99070656 8515072 16749440    0    0     0  1524 7162 8156  9  2 88 
 1  0
 7  0      0 99034560 8515072 16749824    0    0     0     0 8096 10363  7  2 
91  0  0
 7  0      0 99030080 8515072 16750720    0    0     0     0 8798 11673  8  3 
90  0  0
 9  0      0 99037376 8515072 16750080    0    0     0     0 9151 12598  9  3 
87  0  0
 7  0      0 99024128 8515136 16750656    0    0     0     0 9078 13168  7  3 
90  0  0
10  0      0 99034496 8515136 16751488    0    0     0  1800 8633 11675  8  3 
88  1  0
 8  0      0 98986304 8515136 16751296    0    0     0     0 10159 14553  7  3 
90  0  0
 7  0      0 99010112 8515520 16765824    0    0     0     0 8814 12036 10  3 
87  0  0
 4  0      0 99014016 8515648 16773568    0    0     0     0 8091 10445  8  3 
90  0  0
 4  0      0 99064832 8515712 16773120    0    0     0     0 5416 5071  9  2 89 
 0  0
 3  0      0 99118976 8515712 16773184    0    0     0 12716 4743 3533  4  1 92 
 2  0
 3  0      0 99077504 8515840 16773248    0    0     0     0 4525 3988  3  1 96 
 0  0
 2  0      0 99121152 8515840 16773824    0    0     0     0 4687 3757  3  1 97 
 0  0
 2  0      0 99117056 8515840 16773632    0    0     0     0 4334 3156  3  1 96 
 0  0
 2  0      0 99105728 8515840 16774336    0    0     0     0 4355 3246  3  1 96 
 0  0
 3  0      0 99069120 8515904 16773632    0    0     0   648 4902 4037  2  1 97 
 0  0
 1  0      0 99153664 8515968 16774592    0    0     0     0 3776 2711  2  1 97 
 0  0
 1  0      0 99151232 8515968 16774400    0    0     0     0  877  205  4  0 96 
 0  0
 1  0      0 99151424 8516032 16774528    0    0     0   236  774  466  2  0 97 
 0  0
 2  0      0 99148032 8516032 16774656    0    0     0     0  853  350  2  0 98 
 0  0
 2  0      0 99146176 8516032 16774656    0    0     0  1208 1630 1363  1  0 99 
 0  0
 1  0      0 99156032 8516352 16777152    0    0     0     0 1919 2104  1  0 99 
 0  0
 0  0      0 99189376 8516416 16776512    0    0     0     0 1181  799  2  0 98 
 0  0
 0  0      0 99189312 8516416 16776512    0    0     0     0  118   18  0  0 
100  0  0
 0  0      0 99189312 8516416 16776512    0    0     0     0   90   18  0  0 
100  0  0
 0  0      0 99187968 8516416 16776512    0    0     0  5468  196   42  0  0 
100  0  0
 0  0      0 99187968 8516416 16776512    0    0     0     0   92   24  0  0 
100  0  0
 0  0      0 99188032 8516416 16776512    0    0     0     0  146   37  0  0 
100  0  0
 0  0      0 99188160 8516416 16776512    0    0     0   128   91   36  0  0 
100  0  0
 1  0      0 99188160 8516416 16776512    0    0     0     0   74   16  0  0 
100  0  0
 0  0      0 99188160 8516416 16776512    0    0     0     0   72   20  0  0 
100  0  0
 0  0      0 99188224 8516416 16776512    0    0     0     0   76   22  0  0 
100  0  0
 0  0      0 99188224 8516416 16776512    0    0     0     0  118   29  0  0 
100  0  0

which averages to 95% idle.  I changed:

check_objc_parallelize = 6

to 

check_objc_parallelize = 70

to try and get it to go faster:

real    0m21.252s
user    3m21.035s
sys     1m9.937s

:-(  7 seconds (24.6%) faster, but consumes 146% more resources to see the 
benefit.

with the filesystem update to 2 (instead of 10):

real    0m22.478s
user    4m38.564s
sys     1m25.293s

and filesystem update 5:

real    0m21.665s
user    3m51.615s
sys     1m16.005s

and filesystem update 20:

real    0m22.681s
user    3m2.746s
sys     1m5.576s

a -j1 filesystem update 20 for comparison:

real    1m48.127s
user    1m17.953s
sys     0m17.191s

a -j1 check_objc_parallelize 6 filesystem update 10 for comparison:

real    1m47.552s
user    1m17.410s
sys     0m16.909s

a -j70 check_objc_parallelize 10000 filesystem update 10 for comparison:

real    0m21.292s
user    3m17.368s
sys     1m10.106s

a -j70 check_objc_parallelize 10000 filesystem update 2 for comparison:

real    0m21.976s
user    4m37.600s
sys     1m26.598s

a -j70 check_objc_parallelize 10000 filesystem update 200 for comparison:

real    1m12.319s
user    2m49.975s
sys     1m4.537s

a -j70 check_objc_parallelize 12 filesystem update 10 for comparison:

real    0m23.176s
user    1m33.100s
sys     0m25.722s

=======================================================

Switching over to check-c…

-j70 before, 94.4% idle:

real    22m38.331s
user    67m11.810s
sys     13m40.974s

-j70 after (71.28% idle):

real    10m41.448s
user    160m24.871s
sys     36m5.220s

143% more resource intensive to get a 52.8% faster check.  I still see a long 
tail on the test suite run (30 second per line):

procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 96997696 8707392 18756352    0    0     0     0    0    0  0  0 
100  0  0
70  6      0 95642688 8709824 18719232    0    0     0  1231 23366 47068 54 23 
18  4  0
66 10      0 95591872 8711744 18734976    0    0     0  3131 19437 37168 69 19  
6  6  0
66  9      0 94251520 8716352 18780096    0    0     0  3304 18211 34222 70 18  
7  6  0
60 16      0 94398400 8732288 18857152    0    0     0  2654 15808 29888 74 16  
5  5  0
60 14      0 95059008 8749056 18973888    0    0     0  5678 17521 33177 72 17  
6  5  0
60 12      0 94594880 8766656 18981376    0    0     0  2874 15686 28166 72 16  
6  6  0
12  2      0 95515520 8773184 18997760    0    0     0  2109 14987 23655 48  9 
39  4  0
 6  1      0 96211264 8774144 19010560    0    0     0  2111 5049 4993 14  1 85 
 0  0
 3  0      0 96441408 8774336 19016640    0    0     0   529 1870  980  7  0 93 
 0  0
 2  0      0 96493248 8774336 19016128    0    0     0   359  462   79  3  0 97 
 0  0
 2  0      0 96540992 8774400 19016000    0    0     0   417  458   89  3  0 97 
 0  0
 1  0      0 96564736 8774400 19012864    0    0     0   277  482  164  2  0 98 
 0  0
 1  0      0 96566080 8774400 19012928    0    0     0    16  194   31  2  0 98 
 0  0
 1  0      0 96574208 8774400 19012928    0    0     0     9  185   27  2  0 98 
 0  0
 1  0      0 96576192 8774400 19012672    0    0     0     9  197   32  2  0 98 
 0  0
 1  0      0 96584384 8774400 19012736    0    0     0     9  185   26  2  0 98 
 0  0
 1  0      0 96588608 8774400 19012480    0    0     0     9  187   27  2  0 98 
 0  0
 1  0      0 96583872 8774400 19012672    0    0     0    18  183   27  2  0 98 
 0  0
 1  0      0 96579072 8774528 19017472    0    0     0    32  230   55  2  0 98 
 0  0
 1  0      0 96603264 8774592 19016832    0    0     0    92  373  219  2  0 98 
 0  0
 1  0      0 96606528 8774592 19017984    0    0     0   111  357  241  2  0 98 
 0  0

About 3 minutes of using the machine, then 7 minutes of mostly idle.  The worse 
offenders are:

gcc.dg/atomic/atomic.exp completed in 522 seconds
gcc.dg/compat/struct-layout-1.exp completed in 253 seconds
gcc.c-torture/compile/compile.exp completed in 252 seconds
gcc.c-torture/compile/compile.exp completed in 252 seconds
gcc.c-torture/execute/builtins/builtins.exp completed in 193 seconds
gcc.c-torture/execute/builtins/builtins.exp completed in 177 seconds
gcc.dg/atomic/atomic.exp completed in 141 seconds
gcc.c-torture/execute/execute.exp completed in 134 seconds
gcc.c-torture/compile/compile.exp completed in 128 seconds
gcc.dg/guality/guality.exp completed in 112 seconds
gcc.dg/ubsan/ubsan.exp completed in 111 seconds
gcc.dg/torture/dg-torture.exp completed in 109 seconds
gcc.dg/guality/guality.exp completed in 108 seconds
gcc.dg/dg.exp completed in 103 seconds

(all that are over 100 seconds).

curious, when I run atomic.exp=stdatom\*.c:

  gcc.dg/atomic/atomic.exp completed in 30 seconds.

atomic.exp=c\*.c takes 522 seconds with 3, 2, 5 and 4 being the worst offenders.

I worry a little about the scaling overhead of the scheme.  The bin packing 
method I was thinking of would just use a larger number of bins and then bin 
pack them into n bins using the actual testing time taken.  Large bins, we’d 
just split in two.  I kinda expected a -j70 of atomic.exp to use more than 1 
core.

Re: [PATCH] gcc parallel make check

Reply via email to