yield when idle is broken on 1.8. Fixing now.

-Nathan

On Tue, Dec 09, 2014 at 01:02:08PM -0800, Ralph Castain wrote:
> Hmmm….well, it looks like we are doing the right thing and running unbound 
> when oversubscribed like this. I don’t have any brilliant idea why it would 
> be running so slowly in that situation when compared with 1.6.5 - it could be 
> that yield-when-idle is borked. I’ll try to dig into that notion a bit.
> 
> 
> > On Dec 9, 2014, at 10:39 AM, Eric Chamberland 
> > <eric.chamberl...@giref.ulaval.ca> wrote:
> > 
> > Hi again,
> > 
> > I sorted and "seded" (cat outpout.1.00 |sed 's/default/default value/g'|sed 
> > 's/true/1/g' |sed 's/false/0/g') the output.1.00 file from:
> > 
> > mpirun --output-filename output -mca mpi_show_mca_params all 
> > --report-bindings -np 32 myprog
> > 
> > between a launch with 165 vs 183.
> > 
> > The diff may be interesting but I can't interpret everything that is 
> > written...
> > 
> > The files are attached...
> > 
> > Thanks,
> > 
> > Eric
> > 
> > On 12/09/2014 01:02 PM, Eric Chamberland wrote:
> >> On 12/09/2014 12:24 PM, Ralph Castain wrote:
> >>> Can you provide an example cmd line you use to launch one of these
> >>> tests using 1.8.3? Some of the options changed between the 1.6 and 1.8
> >>> series, and we bind by default in 1.8 - the combination may be causing
> >>> you a problem.
> >> 
> >> I very simply launch:
> >> 
> >> "mpirun -np 32 myprog"
> >> 
> >> Maybe the result of "-mca mpi_show_mca_params all" would be insightful?
> >> 
> >> Eric
> >> 
> >>> 
> >>> 
> >>>> On Dec 9, 2014, at 9:14 AM, Eric Chamberland
> >>>> <eric.chamberl...@giref.ulaval.ca> wrote:
> >>>> 
> >>>> Hi,
> >>>> 
> >>>> we were used to do oversubscribing just to do code validation in
> >>>> nightly automated parallel runs of our code.
> >>>> 
> >>>> I just compiled openmpi 1.8.3 and launched the whole suit of
> >>>> sequential/parallel tests and noticed a *major* slowdown in
> >>>> oversubscribed parallel tests with 1.8.3 compared to 1.6.5.
> >>>> 
> >>>> For example, on my computer (2 cpu), a validation test of 64
> >>>> processes launched with 1.8.3 took 1500 seconds (~29 minutes) to
> >>>> execute, while the very same test compiled with 1.6.5 took only 7.4
> >>>> seconds!
> >>>> 
> >>>> To have this result with 1.6.5 we had to set the variable
> >>>> "OMPI_MCA_mpi_yield_when_idle=1", but it seems to have no effects in
> >>>> 1.8.3 when I launch more processes than number of core in my
> >>>> computer, even if it is still mentioned to work (see
> >>>> http://www.open-mpi.org/faq/?category=running#force-aggressive-degraded).
> >>>> However, when I launch with fewer processes than number of core, then
> >>>> it is faster without "OMPI_MCA_mpi_yield_when_idle=1", which is the
> >>>> same behavior in 1.6.5.
> >>>> 
> >>>> I tried to launch with a host file like this:
> >>>> 
> >>>> localhost slots=2
> >>>> 
> >>>> but it changed nothing...
> >>>> 
> >>>> What do I do wrong?
> >>>> 
> >>>> Is it possible to retrieve "performances" of 1.6.5 for oversubscription?
> >>>> 
> >>>> Is there a compilation option that I have to enable in 1.8.3?
> >>>> 
> >>>> Here are the config.log and "ompi_info --all" files for both versions
> >>>> of mpi:
> >>>> 
> >>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/config.165.log.gz
> >>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/config.183.log.gz
> >>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.165.txt.gz
> >>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.183.txt.gz
> >>>> 
> >>>> Thanks,
> >>>> 
> >>>> Eric
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> _______________________________________________
> >>>> users mailing list
> >>>> us...@open-mpi.org
> >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>> Link to this post:
> >>>> http://www.open-mpi.org/community/lists/users/2014/12/25936.php
> >>> 
> >>> _______________________________________________
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>> Link to this post:
> >>> http://www.open-mpi.org/community/lists/users/2014/12/25938.php
> >>> 
> >> 
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> Link to this post:
> >> http://www.open-mpi.org/community/lists/users/2014/12/25940.php
> > 
> > <output.1.00.filtre.165.sorted><output.1.00.filtre.183.sorted.seded>
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/12/25942.php

Attachment: pgppFQdUPCOgO.pgp
Description: PGP signature

Reply via email to