Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

Ralph Castain Tue, 9 Dec 2014 16:02:12 -0500 (EST)

Hmmm….well, it looks like we are doing the right thing and running unbound when 
oversubscribed like this. I don’t have any brilliant idea why it would be 
running so slowly in that situation when compared with 1.6.5 - it could be that 
yield-when-idle is borked. I’ll try to dig into that notion a bit.



> On Dec 9, 2014, at 10:39 AM, Eric Chamberland 
> <eric.chamberl...@giref.ulaval.ca> wrote:
> 
> Hi again,
> 
> I sorted and "seded" (cat outpout.1.00 |sed 's/default/default value/g'|sed 
> 's/true/1/g' |sed 's/false/0/g') the output.1.00 file from:
> 
> mpirun --output-filename output -mca mpi_show_mca_params all 
> --report-bindings -np 32 myprog
> 
> between a launch with 165 vs 183.
> 
> The diff may be interesting but I can't interpret everything that is 
> written...
> 
> The files are attached...
> 
> Thanks,
> 
> Eric
> 
> On 12/09/2014 01:02 PM, Eric Chamberland wrote:
>> On 12/09/2014 12:24 PM, Ralph Castain wrote:
>>> Can you provide an example cmd line you use to launch one of these
>>> tests using 1.8.3? Some of the options changed between the 1.6 and 1.8
>>> series, and we bind by default in 1.8 - the combination may be causing
>>> you a problem.
>> 
>> I very simply launch:
>> 
>> "mpirun -np 32 myprog"
>> 
>> Maybe the result of "-mca mpi_show_mca_params all" would be insightful?
>> 
>> Eric
>> 
>>> 
>>> 
>>>> On Dec 9, 2014, at 9:14 AM, Eric Chamberland
>>>> <eric.chamberl...@giref.ulaval.ca> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> we were used to do oversubscribing just to do code validation in
>>>> nightly automated parallel runs of our code.
>>>> 
>>>> I just compiled openmpi 1.8.3 and launched the whole suit of
>>>> sequential/parallel tests and noticed a *major* slowdown in
>>>> oversubscribed parallel tests with 1.8.3 compared to 1.6.5.
>>>> 
>>>> For example, on my computer (2 cpu), a validation test of 64
>>>> processes launched with 1.8.3 took 1500 seconds (~29 minutes) to
>>>> execute, while the very same test compiled with 1.6.5 took only 7.4
>>>> seconds!
>>>> 
>>>> To have this result with 1.6.5 we had to set the variable
>>>> "OMPI_MCA_mpi_yield_when_idle=1", but it seems to have no effects in
>>>> 1.8.3 when I launch more processes than number of core in my
>>>> computer, even if it is still mentioned to work (see
>>>> http://www.open-mpi.org/faq/?category=running#force-aggressive-degraded).
>>>> However, when I launch with fewer processes than number of core, then
>>>> it is faster without "OMPI_MCA_mpi_yield_when_idle=1", which is the
>>>> same behavior in 1.6.5.
>>>> 
>>>> I tried to launch with a host file like this:
>>>> 
>>>> localhost slots=2
>>>> 
>>>> but it changed nothing...
>>>> 
>>>> What do I do wrong?
>>>> 
>>>> Is it possible to retrieve "performances" of 1.6.5 for oversubscription?
>>>> 
>>>> Is there a compilation option that I have to enable in 1.8.3?
>>>> 
>>>> Here are the config.log and "ompi_info --all" files for both versions
>>>> of mpi:
>>>> 
>>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/config.165.log.gz
>>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/config.183.log.gz
>>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.165.txt.gz
>>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.183.txt.gz
>>>> 
>>>> Thanks,
>>>> 
>>>> Eric
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2014/12/25936.php
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2014/12/25938.php
>>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/12/25940.php
> 
> <output.1.00.filtre.165.sorted><output.1.00.filtre.183.sorted.seded>

Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

Reply via email to