are two mpirun supposed to talk to each other in order to avoid binding to the same cpu/socket/... twice ?

On 11/24/2015 4:46 PM, Aurélien Bouteiller wrote:
You can use the 'mpirun -report-bindings’ option to see how your processes have been mapped in your deployment. If you are unhappy with the default, you can play with the -map-by option.

Aurélien
--
Aurélien Bouteiller, Ph.D. ~~ https://icl.cs.utk.edu/~bouteill/ <https://icl.cs.utk.edu/%7Ebouteill/>

Le 24 nov. 2015 à 02:29, 김건홍(KIM GEON HONG) <geonhong....@hhi.co.kr <mailto:geonhong....@hhi.co.kr>> a écrit :

I use Open MPI 1.8.5.
The command is as following:
$ mpirun –np 40 –hostfile machines simpleFoam –parallel
and the host file “machines” says,
hpcnode127 cpu=20
hpcnode128 cpu=20
Another interesting symptom is that,
if I run two mpirun’s with –np 2 option on a same node, those two mpirun’s run on the same cpu’s. As it is shown in the following figure, only two cpu’s are working while four simpleFoam processes are running.
<image001.png>
Thank you.
Best regards,
Geon-Hong Kim.
*From:*users [mailto:users-boun...@open-mpi.org]*On Behalf Of*Ralph Castain
*Sent:*Tuesday, November 24, 2015 4:11 PM
*To:*Open MPI Users
*Subject:*Re: [OMPI users] OpenMPI with infiniband, child processes of mpirun are missing or overlapped on the same cpu Could you please tell us what version of OpenMPI you are using, and the cmd line you used to execute the job?
Thanks
Ralph

    On Nov 23, 2015, at 11:05 PM,김 건홍(KIM GEON HONG)
    <geonhong....@hhi.co.kr <mailto:geonhong....@hhi.co.kr>> wrote:
    Hello,
    I tried to run a parallel computation (OpenFOAM) using Open MPI
    on a HPC connected with infiniband.
    When I ran a job using mpirun over a couple of nodes (20 cpus per
    node), the computation was not accelerated as I expected.
    For example, I ran the job over 40 cpus on 2 nodes, and I checked
    cpu usages and processes via top command.
    I expected 20 processes would be running on each node but I found
    that only 19 processes were running and a cpu was in idle while
    others were used.
    Following is a capture of top result.
    As you can see, Cpu1 is in idle and there are only 19 simpleFoam
    processes!
    <image002.png>
    I have no idea why this is happened.
    Sometimes, a cpu is in idle while 20 processes are running but,
    in that case two processes running with 50% of cpu usage.
    That is, those two different processes are assigned to the same cpu.
    Please refer to the attached file for required information of the
    cluster and its environment.
    The output of“ulimit–l“command on both nodes is“unlimited”.
    Additional information for OpenFabrics-based network is as following:
    1.OpenFabrics version : MLNX_OFED_LINUX-2.4.1.0.0
    2.Linux/kernel info: RHEL6.5 2.6.32-431.el6.x86_64
    - Linux distro/version : Red hat Enterprise Linux Server release
    6.5 (Santiago)
    - Kernel version : 2.6.32-431.el6.x86_64
    3.Subnet manager : infiniband B class
    Thank you.
    Best regards,
    Geon-Hong Kim.
    *-----------------------------------*
    *<image001.png>*
    *Geon-Hong Kim*
    Engineer, Ph.D.
    Performance Evaluation Research Department
    Hyundai Maritime Research Institute
    Hyundai Heavy Industries Co., Ltd.
    Office +82-52-203-8053
    Fax +82-52-250-9675
    Mobile +82-10-3084-1357
    *-----------------------------------*
    <system_info.tar.bz2>_______________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
    Link to this
    post:http://www.open-mpi.org/community/lists/users/2015/11/28100.php

_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:http://www.open-mpi.org/community/lists/users/2015/11/28102.php



_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/11/28103.php

Reply via email to