[OMPI users] peformance abnormality with openib and tcp framework

2018-05-13 Thread Blade Shieh
/** The problem ***/

I have a cluster with 10GE ethernet and 100Gb infiniband. While running my
application - CAMx, I found that the performance with IB is not as good as
ethernet. That is confusing because IB latency and bandwith is
undoubtablely better than ethernet, which is proven by MPI benchmark
IMB-MPI1 and osu.



/** software stack ***/

centos7.4 with kernel 4.11.0-45.6.1.el7a.aarch64

MLNX_OFED_LINUX-4.3-1.0.1.0 from
http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers

gnu7.3 from OpenHPC release.   yun install
gnu7-compilers-ohpc-7.3.0-43.1.aarch64

openmpi3 from OpenHPC release.  yum install
openmpi3-gnu7-ohpc-3.0.0-36.4.aarch64

CAMx 6.4.0 from http://www.camx.com/

IMB from https://github.com/intel/mpi-benchmarks

OSU from http://mvapich.cse.ohio-state.edu/benchmarks/





/** command lines are /



(time mpirun --allow-run-as-root -mca btl self,openib  -x OMP_NUM_THREADS=2
-n 32 -mca btl_tcp_if_include eth2
../../src/CAMx.v6.40.openMPI.gfortranomp.ompi) > camx_openib_log 2>&1

(time mpirun --allow-run-as-root -mca btl self,tcp  -x OMP_NUM_THREADS=2 -n
32 -mca btl_tcp_if_include eth2
../../src/CAMx.v6.40.openMPI.gfortranomp.ompi) > camx_tcp_log 2>&1



(time mpirun --allow-run-as-root -mca btl self,openib  -x OMP_NUM_THREADS=2
-n 32 -mca btl_tcp_if_include eth2 IMB-MPI1 allreduce -msglog 8 -npmin
1000) > IMB_openib_log 2>&1

(time mpirun --allow-run-as-root -mca btl self,tcp  -x OMP_NUM_THREADS=2 -n
32 -mca btl_tcp_if_include eth2 IMB-MPI1 allreduce -msglog 8 -npmin 1000) >
IMB_tcp_log 2>&1



(time mpirun --allow-run-as-root -mca btl self,openib  -x OMP_NUM_THREADS=2
-n 32 -mca btl_tcp_if_include eth2 osu_latency) > osu_openib_log 2>&1

(time mpirun --allow-run-as-root -mca btl self,tcp  -x OMP_NUM_THREADS=2 -n
32 -mca btl_tcp_if_include eth2 osu_latency) > osu_tcp_log 2>&1



/** about openmpi and network config */



Please refer to relevant log files in the attachment.



*Best Regards,*

*Xie Bin*


ompi_support.tar.bz2
Description: application/bzip2
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Building Open MPI and default hostfile change does not go through

2018-05-13 Thread Gilles Gouaillardet

Konstantinos,


Since you ran

configure --prefix=/usr/local

the system-wide config file should be in

/usr/local/etc/openmpi-default-hostfile


Note /usr/local is the default prefix, so you do not even need the 
--prefix=/usr/local option



Cheers,


Gilles


On 5/12/2018 6:58 AM, Konstantinos Konstantinidis wrote:

Yeap, exactly the hostfile I have is of the form

node1 slots=1
node2 slots=1
node3 slots=1

where the above hostnames are resolved in ~/.ssh/config file which has 
entries of the form


Host node1
 HostName 192.168.0.100
 User ubuntu
 IdentityFile ~/.ssh/mykey.pem

and so on.

So the mpirun cannot pickup the hostfile by itself and I have to 
specify it each time.



On Fri, May 11, 2018 at 4:02 PM, Jeff Squyres (jsquyres) 
mailto:jsquy...@cisco.com>> wrote:


Can you provide some more detail?  I'm not able to get this to
fail (i.e., it seems to be working as expected for me).

For example, what's the contents of your
/etc/openmpi/openmpi-default-hostfile -- did you list some
hostnames in there?


> On May 11, 2018, at 4:43 AM, Konstantinos Konstantinidis
mailto:kostas1...@gmail.com>> wrote:
>
> Hi,
>
> I have built Open MPI 2.1.2 multiple times on Ubuntu 16.04 and
then I add the line
>
> orte_default_hostfile=/etc/openmpi/openmpi-default-hostfile
>
> to the file
>
> /etc/openmpi/openmpi-mca-params.conf
>
> and I execute
>
> sudo chown myUsername /etc/openmpi/openmpi-default-hostfile
>
> For some reason this change never goes through and each time I
run a program with mpirun only one local process runs. So I have
to manually specify my hostname with the --hostfile argument.
>
> What can be the cause of this?
>
> The exact series of commands I use for building is the following
>
> sudo apt-get update
> sudo apt-get upgrade
> sudo apt-get install g++
> sudo apt-get install valgrind
> sudo apt-get install libopenmpi-dev
> sudo apt-get install gfortran
> sudo apt-get install make
> wget
https://www.open-mpi.org/software/ompi/v2.1/downloads/openmpi-2.1.2.tar.gz


> tar -xvf openmpi-* && cd openmpi-*
> ./configure --prefix=/usr/local --enable-mpi-cxx --enable-debug
--enable-memchecker --with-valgrind=/usr
> sudo make all install
>
> Then, I add the following lines to the .bashrc file (Is this
necessary?)
>
> export PATH="$PATH:/usr/local/bin"
> export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/lib"
>
> for setting the path and library path, respectively and of
course reload .bashrc.
>
> Is the above way of installing Open MPI correct? I am really
wondering since I have no solid Linux knowledge.
> ___
> users mailing list
> users@lists.open-mpi.org 
> https://lists.open-mpi.org/mailman/listinfo/users



-- 
Jeff Squyres

jsquy...@cisco.com 




___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] peformance abnormality with openib and tcp framework

2018-05-13 Thread Nathan Hjelm
I see several problems

1) osu_latency only works with two procs.

2) You explicitly excluded shared memory support by specifying only self and 
openib (or tcp). If you want to just disable tcp or openib use —mca btl ^tcp or 
—mca btl ^openib

Also, it looks like you have multiple ports active that are on different 
subnets. You can use —mca btl_openib_if_include to set it to use a specific 
device or devices (i.e. mlx5_0).

See this warning:

--
WARNING: There are more than one active ports on host 'localhost', but the
default subnet GID prefix was detected on more than one of these
ports.  If these ports are connected to different physical IB
networks, this configuration will fail in Open MPI.  This version of
Open MPI requires that every physically separate IB subnet that is
used between connected MPI processes must have different subnet ID
values.

Please see this FAQ entry for more details:

  http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid

NOTE: You can turn off this warning by setting the MCA parameter
  btl_openib_warn_default_gid_prefix to 0.
--


-Nathan

> On May 13, 2018, at 7:44 PM, Blade Shieh  wrote:
> 
> 
> /** The problem ***/
> 
> I have a cluster with 10GE ethernet and 100Gb infiniband. While running my 
> application - CAMx, I found that the performance with IB is not as good as 
> ethernet. That is confusing because IB latency and bandwith is undoubtablely 
> better than ethernet, which is proven by MPI benchmark IMB-MPI1 and osu.
> 
> 
> 
> /** software stack ***/
> 
> centos7.4 with kernel 4.11.0-45.6.1.el7a.aarch64
> 
> MLNX_OFED_LINUX-4.3-1.0.1.0 from 
> http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers
> 
> gnu7.3 from OpenHPC release.   yun install 
> gnu7-compilers-ohpc-7.3.0-43.1.aarch64
> 
> openmpi3 from OpenHPC release.  yum install 
> openmpi3-gnu7-ohpc-3.0.0-36.4.aarch64
> 
> CAMx 6.4.0 from http://www.camx.com/
> 
> IMB from https://github.com/intel/mpi-benchmarks
> 
> OSU from http://mvapich.cse.ohio-state.edu/benchmarks/
> 
> 
> 
> 
> 
> /** command lines are /
> 
> 
> 
> (time mpirun --allow-run-as-root -mca btl self,openib  -x OMP_NUM_THREADS=2 
> -n 32 -mca btl_tcp_if_include eth2 
> ../../src/CAMx.v6.40.openMPI.gfortranomp.ompi) > camx_openib_log 2>&1
> 
> (time mpirun --allow-run-as-root -mca btl self,tcp  -x OMP_NUM_THREADS=2 -n 
> 32 -mca btl_tcp_if_include eth2 
> ../../src/CAMx.v6.40.openMPI.gfortranomp.ompi) > camx_tcp_log 2>&1
> 
> 
> 
> (time mpirun --allow-run-as-root -mca btl self,openib  -x OMP_NUM_THREADS=2 
> -n 32 -mca btl_tcp_if_include eth2 IMB-MPI1 allreduce -msglog 8 -npmin 1000) 
> > IMB_openib_log 2>&1
> 
> (time mpirun --allow-run-as-root -mca btl self,tcp  -x OMP_NUM_THREADS=2 -n 
> 32 -mca btl_tcp_if_include eth2 IMB-MPI1 allreduce -msglog 8 -npmin 1000) > 
> IMB_tcp_log 2>&1
> 
> 
> 
> (time mpirun --allow-run-as-root -mca btl self,openib  -x OMP_NUM_THREADS=2 
> -n 32 -mca btl_tcp_if_include eth2 osu_latency) > osu_openib_log 2>&1
> 
> (time mpirun --allow-run-as-root -mca btl self,tcp  -x OMP_NUM_THREADS=2 -n 
> 32 -mca btl_tcp_if_include eth2 osu_latency) > osu_tcp_log 2>&1
> 
> 
> 
> /** about openmpi and network config */
> 
> 
> 
> Please refer to relevant log files in the attachment.
> 
> 
> 
> Best Regards,
> 
> Xie Bin
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users



signature.asc
Description: Message signed with OpenPGP
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Building Open MPI and default hostfile change does not go through

2018-05-13 Thread Konstantinos Konstantinidis
Thank you Gilles,

One more question. Do I need to add the following lines to the .bashrc file
after installing Open MPI?

export PATH="$PATH:/usr/local/bin"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/lib"


On Sun, May 13, 2018 at 8:48 PM, Gilles Gouaillardet 
wrote:

> Konstantinos,
>
>
> Since you ran
>
> configure --prefix=/usr/local
>
> the system-wide config file should be in
>
> /usr/local/etc/openmpi-default-hostfile
>
>
> Note /usr/local is the default prefix, so you do not even need the
> --prefix=/usr/local option
>
>
> Cheers,
>
>
> Gilles
>
>
> On 5/12/2018 6:58 AM, Konstantinos Konstantinidis wrote:
>
>> Yeap, exactly the hostfile I have is of the form
>>
>> node1 slots=1
>> node2 slots=1
>> node3 slots=1
>>
>> where the above hostnames are resolved in ~/.ssh/config file which has
>> entries of the form
>>
>> Host node1
>>  HostName 192.168.0.100
>>  User ubuntu
>>  IdentityFile ~/.ssh/mykey.pem
>>
>> and so on.
>>
>> So the mpirun cannot pickup the hostfile by itself and I have to specify
>> it each time.
>>
>>
>> On Fri, May 11, 2018 at 4:02 PM, Jeff Squyres (jsquyres) <
>> jsquy...@cisco.com > wrote:
>>
>> Can you provide some more detail?  I'm not able to get this to
>> fail (i.e., it seems to be working as expected for me).
>>
>> For example, what's the contents of your
>> /etc/openmpi/openmpi-default-hostfile -- did you list some
>> hostnames in there?
>>
>>
>> > On May 11, 2018, at 4:43 AM, Konstantinos Konstantinidis
>> mailto:kostas1...@gmail.com>> wrote:
>> >
>> > Hi,
>> >
>> > I have built Open MPI 2.1.2 multiple times on Ubuntu 16.04 and
>> then I add the line
>> >
>> > orte_default_hostfile=/etc/openmpi/openmpi-default-hostfile
>> >
>> > to the file
>> >
>> > /etc/openmpi/openmpi-mca-params.conf
>> >
>> > and I execute
>> >
>> > sudo chown myUsername /etc/openmpi/openmpi-default-hostfile
>> >
>> > For some reason this change never goes through and each time I
>> run a program with mpirun only one local process runs. So I have
>> to manually specify my hostname with the --hostfile argument.
>> >
>> > What can be the cause of this?
>> >
>> > The exact series of commands I use for building is the following
>> >
>> > sudo apt-get update
>> > sudo apt-get upgrade
>> > sudo apt-get install g++
>> > sudo apt-get install valgrind
>> > sudo apt-get install libopenmpi-dev
>> > sudo apt-get install gfortran
>> > sudo apt-get install make
>> > wget
>> https://www.open-mpi.org/software/ompi/v2.1/downloads/openmp
>> i-2.1.2.tar.gz
>> > pi-2.1.2.tar.gz>
>>
>> > tar -xvf openmpi-* && cd openmpi-*
>> > ./configure --prefix=/usr/local --enable-mpi-cxx --enable-debug
>> --enable-memchecker --with-valgrind=/usr
>> > sudo make all install
>> >
>> > Then, I add the following lines to the .bashrc file (Is this
>> necessary?)
>> >
>> > export PATH="$PATH:/usr/local/bin"
>> > export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/lib"
>> >
>> > for setting the path and library path, respectively and of
>> course reload .bashrc.
>> >
>> > Is the above way of installing Open MPI correct? I am really
>> wondering since I have no solid Linux knowledge.
>> > ___
>> > users mailing list
>> > users@lists.open-mpi.org 
>> > https://lists.open-mpi.org/mailman/listinfo/users
>> 
>>
>>
>> -- Jeff Squyres
>> jsquy...@cisco.com 
>>
>>
>>
>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Building Open MPI and default hostfile change does not go through

2018-05-13 Thread Gilles Gouaillardet

Konstantinos,


You need to double check that, your OS might have done it out of the box 
for you already.



Once logged, you can

which mpirun

If it resolves to /usr/local/bin/mpirun, then there is no need to update 
$PATH, and then


ldd /usr/local/bin/mpirun

If it correctly resolves to /usr/local/lib/libopen-pal.so and friends, 
then there is no need


to update $LD_LIBRARY_PATH as well


Cheers,


Gilles


On 5/14/2018 12:46 PM, Konstantinos Konstantinidis wrote:

Thank you Gilles,

One more question. Do I need to add the following lines to the .bashrc 
file after installing Open MPI?


export PATH="$PATH:/usr/local/bin"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/lib"


On Sun, May 13, 2018 at 8:48 PM, Gilles Gouaillardet 
mailto:gil...@rist.or.jp>> wrote:


Konstantinos,


Since you ran

configure --prefix=/usr/local

the system-wide config file should be in

/usr/local/etc/openmpi-default-hostfile


Note /usr/local is the default prefix, so you do not even need the
--prefix=/usr/local option


Cheers,


Gilles


On 5/12/2018 6:58 AM, Konstantinos Konstantinidis wrote:

Yeap, exactly the hostfile I have is of the form

node1 slots=1
node2 slots=1
node3 slots=1

where the above hostnames are resolved in ~/.ssh/config file
which has entries of the form

Host node1
 HostName 192.168.0.100
 User ubuntu
 IdentityFile ~/.ssh/mykey.pem

and so on.

So the mpirun cannot pickup the hostfile by itself and I have
to specify it each time.


On Fri, May 11, 2018 at 4:02 PM, Jeff Squyres (jsquyres)
mailto:jsquy...@cisco.com>
>> wrote:

    Can you provide some more detail?  I'm not able to get this to
    fail (i.e., it seems to be working as expected for me).

    For example, what's the contents of your
    /etc/openmpi/openmpi-default-hostfile -- did you list some
    hostnames in there?


    > On May 11, 2018, at 4:43 AM, Konstantinos Konstantinidis
    mailto:kostas1...@gmail.com>
>>
wrote:
    >
    > Hi,
    >
    > I have built Open MPI 2.1.2 multiple times on Ubuntu
16.04 and
    then I add the line
    >
    > orte_default_hostfile=/etc/openmpi/openmpi-default-hostfile
    >
    > to the file
    >
    > /etc/openmpi/openmpi-mca-params.conf
    >
    > and I execute
    >
    > sudo chown myUsername /etc/openmpi/openmpi-default-hostfile
    >
    > For some reason this change never goes through and each
time I
    run a program with mpirun only one local process runs. So
I have
    to manually specify my hostname with the --hostfile argument.
    >
    > What can be the cause of this?
    >
    > The exact series of commands I use for building is the
following
    >
    > sudo apt-get update
    > sudo apt-get upgrade
    > sudo apt-get install g++
    > sudo apt-get install valgrind
    > sudo apt-get install libopenmpi-dev
    > sudo apt-get install gfortran
    > sudo apt-get install make
    > wget

https://www.open-mpi.org/software/ompi/v2.1/downloads/openmpi-2.1.2.tar.gz


   

>

    > tar -xvf openmpi-* && cd openmpi-*
    > ./configure --prefix=/usr/local --enable-mpi-cxx
--enable-debug
    --enable-memchecker --with-valgrind=/usr
    > sudo make all install
    >
    > Then, I add the following lines to the .bashrc file (Is this
    necessary?)
    >
    > export PATH="$PATH:/usr/local/bin"
    > export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/lib"
    >
    > for setting the path and library path, respectively and of
    course reload .bashrc.
    >
    > Is the above way of installing Open MPI correct? I am really
    wondering since I have no solid Linux knowledge.
    > ___
    > users mailing list
    > users@lists.open-mpi.org

>
    > https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-13 Thread Max Mellette
Hi Gilles,

Thanks for the suggestions; the results are below. Any ideas where to go
from here?

- Seems that selinux is not installed:

user@b09-30:~$ sestatus
The program 'sestatus' is currently not installed. You can install it by
typing:
sudo apt install policycoreutils

- Output from orted:

user@b09-30:~$ /usr/bin/ssh -x b09-32 orted
[b09-32:197698] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
ess_env_module.c at line 147
[b09-32:197698] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file
util/session_dir.c at line 106
[b09-32:197698] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file
util/session_dir.c at line 345
[b09-32:197698] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file
base/ess_base_std_orted.c at line 270
--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_session_dir failed
  --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS
--

- iptables rules:

user@b09-30:~$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source   destination
ufw-before-logging-input  all  --  anywhere anywhere
ufw-before-input  all  --  anywhere anywhere
ufw-after-input  all  --  anywhere anywhere
ufw-after-logging-input  all  --  anywhere anywhere
ufw-reject-input  all  --  anywhere anywhere
ufw-track-input  all  --  anywhere anywhere

Chain FORWARD (policy ACCEPT)
target prot opt source   destination
ufw-before-logging-forward  all  --  anywhere anywhere
ufw-before-forward  all  --  anywhere anywhere
ufw-after-forward  all  --  anywhere anywhere
ufw-after-logging-forward  all  --  anywhere anywhere
ufw-reject-forward  all  --  anywhere anywhere
ufw-track-forward  all  --  anywhere anywhere

Chain OUTPUT (policy ACCEPT)
target prot opt source   destination
ufw-before-logging-output  all  --  anywhere anywhere
ufw-before-output  all  --  anywhere anywhere
ufw-after-output  all  --  anywhere anywhere
ufw-after-logging-output  all  --  anywhere anywhere
ufw-reject-output  all  --  anywhere anywhere
ufw-track-output  all  --  anywhere anywhere

Chain ufw-after-forward (1 references)
target prot opt source   destination

Chain ufw-after-input (1 references)
target prot opt source   destination

Chain ufw-after-logging-forward (1 references)
target prot opt source   destination

Chain ufw-after-logging-input (1 references)
target prot opt source   destination

Chain ufw-after-logging-output (1 references)
target prot opt source   destination

Chain ufw-after-output (1 references)
target prot opt source   destination

Chain ufw-before-forward (1 references)
target prot opt source   destination

Chain ufw-before-input (1 references)
target prot opt source   destination

Chain ufw-before-logging-forward (1 references)
target prot opt source   destination

Chain ufw-before-logging-input (1 references)
target prot opt source   destination

Chain ufw-before-logging-output (1 references)
target prot opt source   destination

Chain ufw-before-output (1 references)
target prot opt source   destination

Chain ufw-reject-forward (1 references)
target prot opt source   destination

Chain ufw-reject-input (1 references)
target prot opt source   destination

Chain ufw-reject-output (1 references)
target prot opt source   destination

Chain ufw-track-forward (1 references)
target prot opt source   destination

Chain ufw-track-input (1 references)
target prot opt source   destination

Chain ufw-track-output (1 references)
target prot opt source   destination


Thanks,
Max
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users