On Sep 12, 2014, at 9:32 AM, Jakub Jelinek <ja...@redhat.com> wrote: > Here is my latest version of the patch.
I did a timing test: Before: real 0m57.198s user 1m24.736s sys 0m19.816s after: real 0m28.224s user 1m27.823s sys 0m22.374s This is a -j70 run on a 64 core power7 of check-objc, I picked an obscure test case that I had no reason to believe was other than ignored and certainly not engineered for and kinda small to ensure the overhead would penalize it… 50.66% faster. There is still room for improvement: $ vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 7 0 0 99046848 8515072 16748672 0 0 0 0 1 1 0 0 100 0 0 7 0 0 99050432 8515072 16748736 0 0 0 0 7501 9022 13 3 84 0 0 7 0 0 99029376 8515072 16749248 0 0 0 0 7320 8777 10 2 88 0 0 7 0 0 99070656 8515072 16749440 0 0 0 1524 7162 8156 9 2 88 1 0 7 0 0 99034560 8515072 16749824 0 0 0 0 8096 10363 7 2 91 0 0 7 0 0 99030080 8515072 16750720 0 0 0 0 8798 11673 8 3 90 0 0 9 0 0 99037376 8515072 16750080 0 0 0 0 9151 12598 9 3 87 0 0 7 0 0 99024128 8515136 16750656 0 0 0 0 9078 13168 7 3 90 0 0 10 0 0 99034496 8515136 16751488 0 0 0 1800 8633 11675 8 3 88 1 0 8 0 0 98986304 8515136 16751296 0 0 0 0 10159 14553 7 3 90 0 0 7 0 0 99010112 8515520 16765824 0 0 0 0 8814 12036 10 3 87 0 0 4 0 0 99014016 8515648 16773568 0 0 0 0 8091 10445 8 3 90 0 0 4 0 0 99064832 8515712 16773120 0 0 0 0 5416 5071 9 2 89 0 0 3 0 0 99118976 8515712 16773184 0 0 0 12716 4743 3533 4 1 92 2 0 3 0 0 99077504 8515840 16773248 0 0 0 0 4525 3988 3 1 96 0 0 2 0 0 99121152 8515840 16773824 0 0 0 0 4687 3757 3 1 97 0 0 2 0 0 99117056 8515840 16773632 0 0 0 0 4334 3156 3 1 96 0 0 2 0 0 99105728 8515840 16774336 0 0 0 0 4355 3246 3 1 96 0 0 3 0 0 99069120 8515904 16773632 0 0 0 648 4902 4037 2 1 97 0 0 1 0 0 99153664 8515968 16774592 0 0 0 0 3776 2711 2 1 97 0 0 1 0 0 99151232 8515968 16774400 0 0 0 0 877 205 4 0 96 0 0 1 0 0 99151424 8516032 16774528 0 0 0 236 774 466 2 0 97 0 0 2 0 0 99148032 8516032 16774656 0 0 0 0 853 350 2 0 98 0 0 2 0 0 99146176 8516032 16774656 0 0 0 1208 1630 1363 1 0 99 0 0 1 0 0 99156032 8516352 16777152 0 0 0 0 1919 2104 1 0 99 0 0 0 0 0 99189376 8516416 16776512 0 0 0 0 1181 799 2 0 98 0 0 0 0 0 99189312 8516416 16776512 0 0 0 0 118 18 0 0 100 0 0 0 0 0 99189312 8516416 16776512 0 0 0 0 90 18 0 0 100 0 0 0 0 0 99187968 8516416 16776512 0 0 0 5468 196 42 0 0 100 0 0 0 0 0 99187968 8516416 16776512 0 0 0 0 92 24 0 0 100 0 0 0 0 0 99188032 8516416 16776512 0 0 0 0 146 37 0 0 100 0 0 0 0 0 99188160 8516416 16776512 0 0 0 128 91 36 0 0 100 0 0 1 0 0 99188160 8516416 16776512 0 0 0 0 74 16 0 0 100 0 0 0 0 0 99188160 8516416 16776512 0 0 0 0 72 20 0 0 100 0 0 0 0 0 99188224 8516416 16776512 0 0 0 0 76 22 0 0 100 0 0 0 0 0 99188224 8516416 16776512 0 0 0 0 118 29 0 0 100 0 0 which averages to 95% idle. I changed: check_objc_parallelize = 6 to check_objc_parallelize = 70 to try and get it to go faster: real 0m21.252s user 3m21.035s sys 1m9.937s :-( 7 seconds (24.6%) faster, but consumes 146% more resources to see the benefit. with the filesystem update to 2 (instead of 10): real 0m22.478s user 4m38.564s sys 1m25.293s and filesystem update 5: real 0m21.665s user 3m51.615s sys 1m16.005s and filesystem update 20: real 0m22.681s user 3m2.746s sys 1m5.576s a -j1 filesystem update 20 for comparison: real 1m48.127s user 1m17.953s sys 0m17.191s a -j1 check_objc_parallelize 6 filesystem update 10 for comparison: real 1m47.552s user 1m17.410s sys 0m16.909s a -j70 check_objc_parallelize 10000 filesystem update 10 for comparison: real 0m21.292s user 3m17.368s sys 1m10.106s a -j70 check_objc_parallelize 10000 filesystem update 2 for comparison: real 0m21.976s user 4m37.600s sys 1m26.598s a -j70 check_objc_parallelize 10000 filesystem update 200 for comparison: real 1m12.319s user 2m49.975s sys 1m4.537s a -j70 check_objc_parallelize 12 filesystem update 10 for comparison: real 0m23.176s user 1m33.100s sys 0m25.722s ======================================================= Switching over to check-c… -j70 before, 94.4% idle: real 22m38.331s user 67m11.810s sys 13m40.974s -j70 after (71.28% idle): real 10m41.448s user 160m24.871s sys 36m5.220s 143% more resource intensive to get a 52.8% faster check. I still see a long tail on the test suite run (30 second per line): procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 2 0 0 96997696 8707392 18756352 0 0 0 0 0 0 0 0 100 0 0 70 6 0 95642688 8709824 18719232 0 0 0 1231 23366 47068 54 23 18 4 0 66 10 0 95591872 8711744 18734976 0 0 0 3131 19437 37168 69 19 6 6 0 66 9 0 94251520 8716352 18780096 0 0 0 3304 18211 34222 70 18 7 6 0 60 16 0 94398400 8732288 18857152 0 0 0 2654 15808 29888 74 16 5 5 0 60 14 0 95059008 8749056 18973888 0 0 0 5678 17521 33177 72 17 6 5 0 60 12 0 94594880 8766656 18981376 0 0 0 2874 15686 28166 72 16 6 6 0 12 2 0 95515520 8773184 18997760 0 0 0 2109 14987 23655 48 9 39 4 0 6 1 0 96211264 8774144 19010560 0 0 0 2111 5049 4993 14 1 85 0 0 3 0 0 96441408 8774336 19016640 0 0 0 529 1870 980 7 0 93 0 0 2 0 0 96493248 8774336 19016128 0 0 0 359 462 79 3 0 97 0 0 2 0 0 96540992 8774400 19016000 0 0 0 417 458 89 3 0 97 0 0 1 0 0 96564736 8774400 19012864 0 0 0 277 482 164 2 0 98 0 0 1 0 0 96566080 8774400 19012928 0 0 0 16 194 31 2 0 98 0 0 1 0 0 96574208 8774400 19012928 0 0 0 9 185 27 2 0 98 0 0 1 0 0 96576192 8774400 19012672 0 0 0 9 197 32 2 0 98 0 0 1 0 0 96584384 8774400 19012736 0 0 0 9 185 26 2 0 98 0 0 1 0 0 96588608 8774400 19012480 0 0 0 9 187 27 2 0 98 0 0 1 0 0 96583872 8774400 19012672 0 0 0 18 183 27 2 0 98 0 0 1 0 0 96579072 8774528 19017472 0 0 0 32 230 55 2 0 98 0 0 1 0 0 96603264 8774592 19016832 0 0 0 92 373 219 2 0 98 0 0 1 0 0 96606528 8774592 19017984 0 0 0 111 357 241 2 0 98 0 0 About 3 minutes of using the machine, then 7 minutes of mostly idle. The worse offenders are: gcc.dg/atomic/atomic.exp completed in 522 seconds gcc.dg/compat/struct-layout-1.exp completed in 253 seconds gcc.c-torture/compile/compile.exp completed in 252 seconds gcc.c-torture/compile/compile.exp completed in 252 seconds gcc.c-torture/execute/builtins/builtins.exp completed in 193 seconds gcc.c-torture/execute/builtins/builtins.exp completed in 177 seconds gcc.dg/atomic/atomic.exp completed in 141 seconds gcc.c-torture/execute/execute.exp completed in 134 seconds gcc.c-torture/compile/compile.exp completed in 128 seconds gcc.dg/guality/guality.exp completed in 112 seconds gcc.dg/ubsan/ubsan.exp completed in 111 seconds gcc.dg/torture/dg-torture.exp completed in 109 seconds gcc.dg/guality/guality.exp completed in 108 seconds gcc.dg/dg.exp completed in 103 seconds (all that are over 100 seconds). curious, when I run atomic.exp=stdatom\*.c: gcc.dg/atomic/atomic.exp completed in 30 seconds. atomic.exp=c\*.c takes 522 seconds with 3, 2, 5 and 4 being the worst offenders. I worry a little about the scaling overhead of the scheme. The bin packing method I was thinking of would just use a larger number of bins and then bin pack them into n bins using the actual testing time taken. Large bins, we’d just split in two. I kinda expected a -j70 of atomic.exp to use more than 1 core.