Hi Nathan,

I pulled your commit  d0da29351f9 and tested it against our example.

It now works perfectly. Strangely, I can even unset "OMPI_MCA_mpi_yield_when_idle=1" and it doesn't seems to last longer.

Can I apply the patch to a fresh "1.8.3" and it should work?

Other question: how can I retrieve the SHA for 1.8.3? (Should they be tagged in the repository? Is it normal if I just see a "dev" tag??)

Thanks,

Eric


On 12/09/2014 04:19 PM, Nathan Hjelm wrote:

yield when idle is broken on 1.8. Fixing now.

-Nathan

On Tue, Dec 09, 2014 at 01:02:08PM -0800, Ralph Castain wrote:
Hmmm….well, it looks like we are doing the right thing and running unbound when 
oversubscribed like this. I don’t have any brilliant idea why it would be 
running so slowly in that situation when compared with 1.6.5 - it could be that 
yield-when-idle is borked. I’ll try to dig into that notion a bit.


On Dec 9, 2014, at 10:39 AM, Eric Chamberland 
<eric.chamberl...@giref.ulaval.ca> wrote:

Hi again,

I sorted and "seded" (cat outpout.1.00 |sed 's/default/default value/g'|sed 
's/true/1/g' |sed 's/false/0/g') the output.1.00 file from:

mpirun --output-filename output -mca mpi_show_mca_params all --report-bindings 
-np 32 myprog

between a launch with 165 vs 183.

The diff may be interesting but I can't interpret everything that is written...

The files are attached...

Thanks,

Eric

On 12/09/2014 01:02 PM, Eric Chamberland wrote:
On 12/09/2014 12:24 PM, Ralph Castain wrote:
Can you provide an example cmd line you use to launch one of these
tests using 1.8.3? Some of the options changed between the 1.6 and 1.8
series, and we bind by default in 1.8 - the combination may be causing
you a problem.

I very simply launch:

"mpirun -np 32 myprog"

Maybe the result of "-mca mpi_show_mca_params all" would be insightful?

Eric



On Dec 9, 2014, at 9:14 AM, Eric Chamberland
<eric.chamberl...@giref.ulaval.ca> wrote:

Hi,

we were used to do oversubscribing just to do code validation in
nightly automated parallel runs of our code.

I just compiled openmpi 1.8.3 and launched the whole suit of
sequential/parallel tests and noticed a *major* slowdown in
oversubscribed parallel tests with 1.8.3 compared to 1.6.5.

For example, on my computer (2 cpu), a validation test of 64
processes launched with 1.8.3 took 1500 seconds (~29 minutes) to
execute, while the very same test compiled with 1.6.5 took only 7.4
seconds!

To have this result with 1.6.5 we had to set the variable
"OMPI_MCA_mpi_yield_when_idle=1", but it seems to have no effects in
1.8.3 when I launch more processes than number of core in my
computer, even if it is still mentioned to work (see
http://www.open-mpi.org/faq/?category=running#force-aggressive-degraded).
However, when I launch with fewer processes than number of core, then
it is faster without "OMPI_MCA_mpi_yield_when_idle=1", which is the
same behavior in 1.6.5.

I tried to launch with a host file like this:

localhost slots=2

but it changed nothing...

What do I do wrong?

Is it possible to retrieve "performances" of 1.6.5 for oversubscription?

Is there a compilation option that I have to enable in 1.8.3?

Here are the config.log and "ompi_info --all" files for both versions
of mpi:

http://www.giref.ulaval.ca/~ericc/ompi_bug/config.165.log.gz
http://www.giref.ulaval.ca/~ericc/ompi_bug/config.183.log.gz
http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.165.txt.gz
http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.183.txt.gz

Thanks,

Eric




_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/12/25936.php

_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/12/25938.php


_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/12/25940.php

<output.1.00.filtre.165.sorted><output.1.00.filtre.183.sorted.seded>

_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/12/25942.php

Reply via email to