https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53957
--- Comment #25 from Anthony <prop_design at protonmail dot com> --- (In reply to Anthony from comment #24) > (In reply to rguent...@suse.de from comment #23) > > On Sun, 28 Jun 2020, prop_design at protonmail dot com wrote: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53957 > > > > > > --- Comment #22 from Anthony <prop_design at protonmail dot com> --- > > > (In reply to Thomas Koenig from comment #21) > > > > Another question: Is there anything left to be done with the > > > > vectorizer, or could we remove that dependency? > > > > > > thanks for looking into this again for me. i'm surprised it worked the > > > same on > > > Linux, but knowing that, at least helps debug this issue some more. I'm > > > not > > > sure about the vectorizer question, maybe that question was intended for > > > someone else. the runtimes seem good as is though. i doubt the > > > auto-parallelization will add much speed. but it's an interesting feature > > > that > > > i've always hoped would work. i've never got it to work though. the only > > > code > > > that did actually implement something was Intel Fortran. it implemented > > > one > > > trivial loop, but it slowed the code down instead of speeding it up. the > > > output > > > from gfortran shows more loops it wants to run in parallel. they aren't > > > important ones. but something would be better than nothing. if it slowed > > > the > > > code down, i would just not use it. > > > > GCC adds runtime checks for a minimal number of iterations before > > dispatching to the parallelized code - I guess we simply never hit > > the threshold. This is configurable via --param parloops-min-per-thread, > > the default is 100, the default number of threads is determined the same > > as for OpenMP so you can probably tune that via OMP_NUM_THREADS. > > thanks for that tip. i tried changing the parloops parameters but no luck. > the only difference was the max thread use went from 2 to 3. core use was > the same. > > i added the following an some variations of these: > > --param parloops-min-per-thread=2 (the default was 100 like you said) > --param parloops-chunk-size=1 (the default was zero so i removed this > parameter later) --param parloops-schedule=auto (tried all options except > guided, the default is static) > > i was able to check that they were set via: > > --help=param -Q > > some other things i tried was adding -mthreads and removing -static. but so > far no luck. i also tried using -mthreads instead of -pthread. > > i should make clear i'm testing PROP_DESIGN_MAPS, not MP_PROP_DESIGN. > MP_PROP_DESIGN is ancient and the added benchmarking loops were messing with > the ability of the optimizer to auto-parallelize (in the past at least). I did more testing and it the added options actually slow the code way down. however, it still is only using one core. from what i can tell if i set OMP_PLACES it doesn't seem like it's working. i saw a thread from years ago where someone had the same problem. i think OMP_PLACES might be working on linux but not on windows. that's what the thread i found was saying. don't really know. but i've exhausted all the possibilities at this point. the only thing i know for sure is i can't get it to use anything more than one core. --- Comment #26 from Anthony <prop_design at protonmail dot com> --- (In reply to Anthony from comment #24) > (In reply to rguent...@suse.de from comment #23) > > On Sun, 28 Jun 2020, prop_design at protonmail dot com wrote: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53957 > > > > > > --- Comment #22 from Anthony <prop_design at protonmail dot com> --- > > > (In reply to Thomas Koenig from comment #21) > > > > Another question: Is there anything left to be done with the > > > > vectorizer, or could we remove that dependency? > > > > > > thanks for looking into this again for me. i'm surprised it worked the > > > same on > > > Linux, but knowing that, at least helps debug this issue some more. I'm > > > not > > > sure about the vectorizer question, maybe that question was intended for > > > someone else. the runtimes seem good as is though. i doubt the > > > auto-parallelization will add much speed. but it's an interesting feature > > > that > > > i've always hoped would work. i've never got it to work though. the only > > > code > > > that did actually implement something was Intel Fortran. it implemented > > > one > > > trivial loop, but it slowed the code down instead of speeding it up. the > > > output > > > from gfortran shows more loops it wants to run in parallel. they aren't > > > important ones. but something would be better than nothing. if it slowed > > > the > > > code down, i would just not use it. > > > > GCC adds runtime checks for a minimal number of iterations before > > dispatching to the parallelized code - I guess we simply never hit > > the threshold. This is configurable via --param parloops-min-per-thread, > > the default is 100, the default number of threads is determined the same > > as for OpenMP so you can probably tune that via OMP_NUM_THREADS. > > thanks for that tip. i tried changing the parloops parameters but no luck. > the only difference was the max thread use went from 2 to 3. core use was > the same. > > i added the following an some variations of these: > > --param parloops-min-per-thread=2 (the default was 100 like you said) > --param parloops-chunk-size=1 (the default was zero so i removed this > parameter later) --param parloops-schedule=auto (tried all options except > guided, the default is static) > > i was able to check that they were set via: > > --help=param -Q > > some other things i tried was adding -mthreads and removing -static. but so > far no luck. i also tried using -mthreads instead of -pthread. > > i should make clear i'm testing PROP_DESIGN_MAPS, not MP_PROP_DESIGN. > MP_PROP_DESIGN is ancient and the added benchmarking loops were messing with > the ability of the optimizer to auto-parallelize (in the past at least). I did more testing and it the added options actually slow the code way down. however, it still is only using one core. from what i can tell if i set OMP_PLACES it doesn't seem like it's working. i saw a thread from years ago where someone had the same problem. i think OMP_PLACES might be working on linux but not on windows. that's what the thread i found was saying. don't really know. but i've exhausted all the possibilities at this point. the only thing i know for sure is i can't get it to use anything more than one core. --- Comment #27 from Anthony <prop_design at protonmail dot com> --- so after trying a bunch of things, i think the final problem may be this. i get the following result when i try to set thread affinity: set GOMP_CPU_AFFINITY="0 1" gives the following feedback at run time; libgomp: Affinity not supported on this configuration i have to close the command prompt window to stop the program. the program doesn't run properly if i try to set thread affinity. so this still makes me thing it might work on linux and not windows 10, but i have no way to test that. the extra threads that auto-parallelization create will only go to one core, on my machine at least.