yield when idle is broken on 1.8. Fixing now. -Nathan
On Tue, Dec 09, 2014 at 01:02:08PM -0800, Ralph Castain wrote: > Hmmm….well, it looks like we are doing the right thing and running unbound > when oversubscribed like this. I don’t have any brilliant idea why it would > be running so slowly in that situation when compared with 1.6.5 - it could be > that yield-when-idle is borked. I’ll try to dig into that notion a bit. > > > > On Dec 9, 2014, at 10:39 AM, Eric Chamberland > > <eric.chamberl...@giref.ulaval.ca> wrote: > > > > Hi again, > > > > I sorted and "seded" (cat outpout.1.00 |sed 's/default/default value/g'|sed > > 's/true/1/g' |sed 's/false/0/g') the output.1.00 file from: > > > > mpirun --output-filename output -mca mpi_show_mca_params all > > --report-bindings -np 32 myprog > > > > between a launch with 165 vs 183. > > > > The diff may be interesting but I can't interpret everything that is > > written... > > > > The files are attached... > > > > Thanks, > > > > Eric > > > > On 12/09/2014 01:02 PM, Eric Chamberland wrote: > >> On 12/09/2014 12:24 PM, Ralph Castain wrote: > >>> Can you provide an example cmd line you use to launch one of these > >>> tests using 1.8.3? Some of the options changed between the 1.6 and 1.8 > >>> series, and we bind by default in 1.8 - the combination may be causing > >>> you a problem. > >> > >> I very simply launch: > >> > >> "mpirun -np 32 myprog" > >> > >> Maybe the result of "-mca mpi_show_mca_params all" would be insightful? > >> > >> Eric > >> > >>> > >>> > >>>> On Dec 9, 2014, at 9:14 AM, Eric Chamberland > >>>> <eric.chamberl...@giref.ulaval.ca> wrote: > >>>> > >>>> Hi, > >>>> > >>>> we were used to do oversubscribing just to do code validation in > >>>> nightly automated parallel runs of our code. > >>>> > >>>> I just compiled openmpi 1.8.3 and launched the whole suit of > >>>> sequential/parallel tests and noticed a *major* slowdown in > >>>> oversubscribed parallel tests with 1.8.3 compared to 1.6.5. > >>>> > >>>> For example, on my computer (2 cpu), a validation test of 64 > >>>> processes launched with 1.8.3 took 1500 seconds (~29 minutes) to > >>>> execute, while the very same test compiled with 1.6.5 took only 7.4 > >>>> seconds! > >>>> > >>>> To have this result with 1.6.5 we had to set the variable > >>>> "OMPI_MCA_mpi_yield_when_idle=1", but it seems to have no effects in > >>>> 1.8.3 when I launch more processes than number of core in my > >>>> computer, even if it is still mentioned to work (see > >>>> http://www.open-mpi.org/faq/?category=running#force-aggressive-degraded). > >>>> However, when I launch with fewer processes than number of core, then > >>>> it is faster without "OMPI_MCA_mpi_yield_when_idle=1", which is the > >>>> same behavior in 1.6.5. > >>>> > >>>> I tried to launch with a host file like this: > >>>> > >>>> localhost slots=2 > >>>> > >>>> but it changed nothing... > >>>> > >>>> What do I do wrong? > >>>> > >>>> Is it possible to retrieve "performances" of 1.6.5 for oversubscription? > >>>> > >>>> Is there a compilation option that I have to enable in 1.8.3? > >>>> > >>>> Here are the config.log and "ompi_info --all" files for both versions > >>>> of mpi: > >>>> > >>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/config.165.log.gz > >>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/config.183.log.gz > >>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.165.txt.gz > >>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.183.txt.gz > >>>> > >>>> Thanks, > >>>> > >>>> Eric > >>>> > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> users mailing list > >>>> us...@open-mpi.org > >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>> Link to this post: > >>>> http://www.open-mpi.org/community/lists/users/2014/12/25936.php > >>> > >>> _______________________________________________ > >>> users mailing list > >>> us...@open-mpi.org > >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >>> Link to this post: > >>> http://www.open-mpi.org/community/lists/users/2014/12/25938.php > >>> > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >> Link to this post: > >> http://www.open-mpi.org/community/lists/users/2014/12/25940.php > > > > <output.1.00.filtre.165.sorted><output.1.00.filtre.183.sorted.seded> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/25942.php
pgppFQdUPCOgO.pgp
Description: PGP signature