Carlos,

Open MPI 3.0.2 has been released, and it contains several bug fixes, so I do

encourage you to upgrade and try again.



if it still does not work, can you please run

mpirun --mca oob_base_verbose 10 ...

and then compress and post the output ?


out of curiosity, would

mpirun --mca routed_radix 1 ...

work in your environment ?


once we can analyze the logs, we should be able to figure out what is going wrong.


Cheers,

Gilles

On 6/29/2018 4:10 AM, carlos aguni wrote:
Just realized my email wasn't sent to the archive.

On Sat, Jun 23, 2018 at 5:34 PM, carlos aguni <aguni...@gmail.com <mailto:aguni...@gmail.com>> wrote:

    Hi!

    Thank you all for your reply Jeff, Gilles and rhc.

    Thank you Jeff and rhc for clarifying to me some of the openmpi's
    internals.

    >> FWIW: we never send interface names to other hosts - just dot
    addresses
    > Should have clarified - when you specify an interface name for the
    MCA param, then it is the interface name that is transferred as
    that is the value of the MCA param. However, once we determine our
    address, we only transfer dot addresses between ourselves

    If only dot addresses are sent to the hosts then why doesn't
    openmpi use the default route like `ip route get <other host IP>`
    instead of choosing a random one? Is it an expected behaviour? Can
    it be changed?

    Sorry. As Gilles pointed out I forgot to mention which openmpi
    version I was using. I'm using openmpi 3.0.0 gcc 7.3.0 from
    openhpc. Centos 7.5.

    > mpirun—mca oob_tcp_if_exclude192.168.100.0/24
    <http://192.168.100.0/24>...

    I cannot just exclude that interface cause after that I want to
    add another computer that's on a different network. And this is
    where things get messy :( I cannot just include and exclude
    networks cause I have different machines on different networks.
    This is what I want to achieve:


        

    compute01

        

    compute02

        

    compute03

    ens3

        

    192.168.100.104/24 <http://192.168.100.104/24>

        

    10.0.0.227/24 <http://10.0.0.227/24>

        

    192.168.100.105/24 <http://192.168.100.105/24>

    ens8

        

    10.0.0.228/24 <http://10.0.0.228/24>

        

    172.21.1.128/24 <http://172.21.1.128/24>

        

    ---

    ens9

        

    172.21.1.155/24 <http://172.21.1.155/24>

        

    ---

        

    ---


    So I'm in compute01 MPI_spawning another process on compute02 and
    compute03.
    With both MPI_Spawn and `mpirun -n 3 -host
    compute01,compute02,compute03 hostname`

    Then when I include the mca parameters I get this:
    `mpirun --oversubscribe --allow-run-as-root -n 3 --mca
    oob_tcp_if_include 10.0.0.0/24,192.168.100.0/24
    <http://10.0.0.0/24,192.168.100.0/24> -host
    compute01,compute02,compute03 hostname`
    WARNING: An invalid value was given for oob_tcp_if_include. This
    value will be ignored.
    ...
    Message:    Did not find interface matching this subnet

    This would all work if it were to use the system's internals like
    `ip route`.

    Best regards,
    Carlos.




_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to