Hmmm….well, it looks like we are doing the right thing and running unbound when oversubscribed like this. I don’t have any brilliant idea why it would be running so slowly in that situation when compared with 1.6.5 - it could be that yield-when-idle is borked. I’ll try to dig into that notion a bit.
> On Dec 9, 2014, at 10:39 AM, Eric Chamberland > <eric.chamberl...@giref.ulaval.ca> wrote: > > Hi again, > > I sorted and "seded" (cat outpout.1.00 |sed 's/default/default value/g'|sed > 's/true/1/g' |sed 's/false/0/g') the output.1.00 file from: > > mpirun --output-filename output -mca mpi_show_mca_params all > --report-bindings -np 32 myprog > > between a launch with 165 vs 183. > > The diff may be interesting but I can't interpret everything that is > written... > > The files are attached... > > Thanks, > > Eric > > On 12/09/2014 01:02 PM, Eric Chamberland wrote: >> On 12/09/2014 12:24 PM, Ralph Castain wrote: >>> Can you provide an example cmd line you use to launch one of these >>> tests using 1.8.3? Some of the options changed between the 1.6 and 1.8 >>> series, and we bind by default in 1.8 - the combination may be causing >>> you a problem. >> >> I very simply launch: >> >> "mpirun -np 32 myprog" >> >> Maybe the result of "-mca mpi_show_mca_params all" would be insightful? >> >> Eric >> >>> >>> >>>> On Dec 9, 2014, at 9:14 AM, Eric Chamberland >>>> <eric.chamberl...@giref.ulaval.ca> wrote: >>>> >>>> Hi, >>>> >>>> we were used to do oversubscribing just to do code validation in >>>> nightly automated parallel runs of our code. >>>> >>>> I just compiled openmpi 1.8.3 and launched the whole suit of >>>> sequential/parallel tests and noticed a *major* slowdown in >>>> oversubscribed parallel tests with 1.8.3 compared to 1.6.5. >>>> >>>> For example, on my computer (2 cpu), a validation test of 64 >>>> processes launched with 1.8.3 took 1500 seconds (~29 minutes) to >>>> execute, while the very same test compiled with 1.6.5 took only 7.4 >>>> seconds! >>>> >>>> To have this result with 1.6.5 we had to set the variable >>>> "OMPI_MCA_mpi_yield_when_idle=1", but it seems to have no effects in >>>> 1.8.3 when I launch more processes than number of core in my >>>> computer, even if it is still mentioned to work (see >>>> http://www.open-mpi.org/faq/?category=running#force-aggressive-degraded). >>>> However, when I launch with fewer processes than number of core, then >>>> it is faster without "OMPI_MCA_mpi_yield_when_idle=1", which is the >>>> same behavior in 1.6.5. >>>> >>>> I tried to launch with a host file like this: >>>> >>>> localhost slots=2 >>>> >>>> but it changed nothing... >>>> >>>> What do I do wrong? >>>> >>>> Is it possible to retrieve "performances" of 1.6.5 for oversubscription? >>>> >>>> Is there a compilation option that I have to enable in 1.8.3? >>>> >>>> Here are the config.log and "ompi_info --all" files for both versions >>>> of mpi: >>>> >>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/config.165.log.gz >>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/config.183.log.gz >>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.165.txt.gz >>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.183.txt.gz >>>> >>>> Thanks, >>>> >>>> Eric >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2014/12/25936.php >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/12/25938.php >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/12/25940.php > > <output.1.00.filtre.165.sorted><output.1.00.filtre.183.sorted.seded>