Re: [OMPI users] MPI Datatypes and RMA

2016-05-02 Thread Gilles Gouaillardet

Bruce,


this issue was previously fixed on master and v2.x, but for some 
reasons, the fix was not backported to v1.10


i made a PR at https://github.com/open-mpi/ompi-release/pull/1120/files

in the mean time, feel free to manually apply the patch at 
https://patch-diff.githubusercontent.com/raw/open-mpi/ompi-release/pull/1120.patch



Cheers,


Gilles


On 4/30/2016 7:40 AM, Palmer, Bruce J wrote:


I’ve been trying to recreate the semantics of the Global Array gather 
and scatter operations using MPI RMA routines and I’ve run into some 
issues with MPI Datatypes. I’ve been focusing on building MPI versions 
of the GA gather and scatter calls, which I’ve been trying to 
implement using MPI data types built with the MPI_Type_create_struct 
call. I’ve developed a test program that simulates copying data into 
and out of a 1D distributed array of size NSIZE. Each processor 
contains a segment of approximately size NSIZE/nproc and is 
responsible for assigning every nprocth value in the array starting 
with the value indexed by the rank of the array. After assigning 
values and synchronizing the distributed data structure, each 
processor then reads the values set by the processor of next higher 
rank (the process with rank nproc-1 reads the values set by process 0).


The distributed array is represented by and MPI window and created 
using a standard MPI_Win_create call. The values in the array are set 
and read using MPI RMA operations, either MPI_Get/MPI_Put or 
MPI_Rget/MPI_Rput. Three different protocols have been used. The first 
is to call MPI_Win_lock and create a shared lock on the remote 
processor, then call MPI_Put/MPI_Get and then call MPI_Win_unlock to 
clear the lock. The second protocol is to use MPI request-based calls. 
After the call to MPI_Win_create, MPI_Win_lock_all is called to start 
a passive synchronization epoch on the window. Data is written and 
read to the distributed array using MPI_Rput/MPI_Rget immediately 
followed by a call to MPI_Wait, using the handle returned by the 
MPI_Rput/MPI_Rget call. The third protocol also immediately creates a 
passive synchronization epoch after window creation, but uses calls to 
MPI_Put/MPI_Get immediately followed by a call to MPI_Win_flush_local. 
These three protocols seem to cover all the possibilities that I have 
seen in other MPI/RMA based implementations of ARMCI/GA.


The issue that I’ve run into is that these tests seem to work reliably 
if I build the data type using the MPI_Type_create_subbarray function 
but fail for larger arrays (NSIZE ~ 1) when I use 
MPI_Type_create_struct. Because the values being set by each processor 
are evenly spaced, I can use either function in this case (this is not 
generally true in applications). The struct data type hangs on 2 
processors using lock/unlock, crashes for the request-based protocol 
and does not get the correct values in the Get phase of the data 
transfer when using flush_local. These tests are done on a Linux 
cluster using an Infiniband interconnect and the value of NSIZE is 
1. For comparison, the same test using MPI_Type_create_subarray 
seems to function reliably for all three protocols for NSIZE=100 
using 1,2,8 processors on 1 and 2 SMP nodes.


I’ve attached the test program for these test cases. Does anyone have 
a suggestion about what might be going on here?


Bruce



___
users mailing list
us...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/04/29059.php




[OMPI users] OpenSHMEM + STM Linking Problem

2016-05-02 Thread RYAN RAY
In my computer I have installed both OpenMPI and TinySTM. I have written a code 
which has both shmem and 
Software Transactional Memory(STM) calls. When I am compiling the code using 
oshcc it is showing "stm.h 
not found". Could anyone please help me on this matter?

Regards

RYAN SAPTARSHI RAY

Re: [OMPI users] OpenSHMEM + STM Linking Problem

2016-05-02 Thread Jeff Squyres (jsquyres)
stm.h is not a header file in either Open MPI or OpenSHMEM.  Is that a TinySTM 
header file?

If you're having a problem with compiling TinySTM applications, you should 
probably contact their support channels -- we don't know/can't help with that.  
Sorry.



> On May 2, 2016, at 5:57 AM, RYAN RAY  wrote:
> 
> In my computer I have installed both OpenMPI and TinySTM. I have written a 
> code which has both shmem and 
> Software Transactional Memory(STM) calls. When I am compiling the code using 
> oshcc it is showing "stm.h 
> not found". Could anyone please help me on this matter?
> 
> Regards
> 
> RYAN SAPTARSHI RAY
> 
> Get your own FREE website, FREE domain & FREE mobile app with Company email.  
> Know More >
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29062.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] OpenSHMEM + STM Linking Problem

2016-05-02 Thread RYAN RAY
Dear Jeff

Yes stm.h is a TinySTM header file. My query is that is it possible to use both 
shmem and TinySTM calls 
in the same code?

Regards

Ryan
On Mon, 02 May 2016 16:05:05 +0530 "Jeff Squyres (jsquyres)"  wrote
>stm.h is not a header file in either Open MPI or OpenSHMEM. Is that a TinySTM 
>header file?



If you're having a problem with compiling TinySTM applications, you should 
probably contact their 
support channels -- we don't know/can't help with that. Sorry.







> On May 2, 2016, at 5:57 AM, RYAN RAY  wrote:

> 

> In my computer I have installed both OpenMPI and TinySTM. I have written a 
> code which has both shmem 
and 

> Software Transactional Memory(STM) calls. When I am compiling the code using 
> oshcc it is showing 
"stm.h 

> not found". Could anyone please help me on this matter?

> 

> Regards

> 

> RYAN SAPTARSHI RAY

> 

> Get your own FREE website, FREE domain & FREE mobile app with Company email. 

> Know More >

> ___

> users mailing list

> us...@open-mpi.org

> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users

> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29062.php





-- 

Jeff Squyres

jsquy...@cisco.com

For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



___

users mailing list

us...@open-mpi.org

Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users

Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/05/29063.php



Re: [OMPI users] OpenSHMEM + STM Linking Problem

2016-05-02 Thread Gilles Gouaillardet
Ryan,

I do not know if that can work, but you should at least be able to compile
your application.
if you use MPI wrappers (e.g. mpicc and friends), then you likely have to
explicitly set the stm path and library

for example
mpicc -I$STM_HOME/include myapp.c -L$STM_HOME/lib -lstm

Cheers,

Gilles

On Monday, May 2, 2016, RYAN RAY  wrote:

> Dear Jeff
>
> Yes stm.h is a TinySTM header file. My query is that is it possible to use
> both shmem and TinySTM calls
> in the same code?
>
> Regards
>
> Ryan
> On Mon, 02 May 2016 16:05:05 +0530 "Jeff Squyres (jsquyres)" wrote
> >stm.h is not a header file in either Open MPI or OpenSHMEM. Is that a
> TinySTM header file?
>
>
>
> If you're having a problem with compiling TinySTM applications, you should
> probably contact their
> support channels -- we don't know/can't help with that. Sorry.
>
>
>
>
>
>
>
> > On May 2, 2016, at 5:57 AM, RYAN RAY wrote:
>
> >
>
> > In my computer I have installed both OpenMPI and TinySTM. I have written
> a code which has both shmem
> and
>
> > Software Transactional Memory(STM) calls. When I am compiling the code
> using oshcc it is showing
> "stm.h
>
> > not found". Could anyone please help me on this matter?
>
> >
>
> > Regards
>
> >
>
> > RYAN SAPTARSHI RAY
>
> >
>
> > Get your own FREE website, FREE domain & FREE mobile app with Company
> email.
>
> > Know More >
>
> > ___
>
> > users mailing list
>
> > us...@open-mpi.org 
>
> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/05/29062.php
>
>
>
>
>
> --
>
> Jeff Squyres
>
> jsquy...@cisco.com 
>
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>
> ___
>
> users mailing list
>
> us...@open-mpi.org 
>
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/05/29063.php
>
>
>
> 
>
> Get your own *FREE* website, *FREE* domain & *FREE* mobile app with
> Company email.
> *Know More >*
> 


Re: [OMPI users] users Digest, Vol 3489, Issue 1

2016-05-02 Thread Palmer, Bruce J
Gilles,

I downloaded and built openmpi-2.0.0rc2 and used that for the test. I get a 
crash on more than 1 processor for the lock/unlock protocol with the error 
message

[node005:29916] *** An error occurred in MPI_Win_lock
[node005:29916] *** reported by process [3736862721,6]
[node005:29916] *** on win rdma window 3
[node005:29916] *** MPI_ERR_RMA_SYNC: error executing rma sync
[node005:29916] *** MPI_ERRORS_ARE_FATAL (processes in this win will now abort,
[node005:29916] ***and potentially your MPI job)

and the request-based protocol hangs on the MPI_Rget call. The flush_local 
protocol seems to work though. Unlike 1.8.3, the problems seem to occur no 
matter what the value of NSIZE is. Should I try actually building 1.10 after 
applying the patch to it?

Bruce

Message: 1
List-Post: users@lists.open-mpi.org
Date: Mon, 2 May 2016 13:42:21 +0900
From: Gilles Gouaillardet 
To: Open MPI Users 
Subject: Re: [OMPI users] MPI Datatypes and RMA
Message-ID: <01c20fdf-c41b-96a8-6732-661745ddf...@rist.or.jp>
Content-Type: text/plain; charset="windows-1252"; Format="flowed"

Bruce,


this issue was previously fixed on master and v2.x, but for some reasons, the 
fix was not backported to v1.10

i made a PR at https://github.com/open-mpi/ompi-release/pull/1120/files

in the mean time, feel free to manually apply the patch at 
https://patch-diff.githubusercontent.com/raw/open-mpi/ompi-release/pull/1120.patch


Cheers,


Gilles


Re: [OMPI users] users Digest, Vol 3489, Issue 1

2016-05-02 Thread Nathan Hjelm

Its not really a good idea to mix active and passive synchronization (we
may actually explicitly forbid it in the future). You can remove the
calls to MPI_Win_fence () and still have correct synchronization. That
said, you did find a bug in my bad synchronization detection because
this is legal:

MPI_Win_fence (...);
MPI_Win_lock (...);

but this is not

MPI_Win_fence (...);
MPI_Put (...); /* MPI_Get, MPI_Accumulate, etc */
MPI_Win_lock (...);

I will fix the bad synchronization detection in osc/rdma.

-Nathan

On Mon, May 02, 2016 at 06:49:15PM +, Palmer, Bruce J wrote:
> Gilles,
> 
> I downloaded and built openmpi-2.0.0rc2 and used that for the test. I get a 
> crash on more than 1 processor for the lock/unlock protocol with the error 
> message
> 
> [node005:29916] *** An error occurred in MPI_Win_lock
> [node005:29916] *** reported by process [3736862721,6]
> [node005:29916] *** on win rdma window 3
> [node005:29916] *** MPI_ERR_RMA_SYNC: error executing rma sync
> [node005:29916] *** MPI_ERRORS_ARE_FATAL (processes in this win will now 
> abort,
> [node005:29916] ***and potentially your MPI job)
> 
> and the request-based protocol hangs on the MPI_Rget call. The flush_local 
> protocol seems to work though. Unlike 1.8.3, the problems seem to occur no 
> matter what the value of NSIZE is. Should I try actually building 1.10 after 
> applying the patch to it?
> 
> Bruce
> 
> Message: 1
> Date: Mon, 2 May 2016 13:42:21 +0900
> From: Gilles Gouaillardet 
> To: Open MPI Users 
> Subject: Re: [OMPI users] MPI Datatypes and RMA
> Message-ID: <01c20fdf-c41b-96a8-6732-661745ddf...@rist.or.jp>
> Content-Type: text/plain; charset="windows-1252"; Format="flowed"
> 
> Bruce,
> 
> 
> this issue was previously fixed on master and v2.x, but for some reasons, the 
> fix was not backported to v1.10
> 
> i made a PR at https://github.com/open-mpi/ompi-release/pull/1120/files
> 
> in the mean time, feel free to manually apply the patch at 
> https://patch-diff.githubusercontent.com/raw/open-mpi/ompi-release/pull/1120.patch
> 
> 
> Cheers,
> 
> 
> Gilles
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29066.php


pgpKVTulDYUBB.pgp
Description: PGP signature


[OMPI users] Problem with 'orted: command not found'

2016-05-02 Thread Maciek Lewiński
Hi, I'm having problem with Open MPI version 1.10.2.
I've installed two virtual machines on VirtualBox, both are the same images
of Ubuntu 12.04 64bit.
Both have the same accounts, both have everything configured almost exactly
the same.
I have configured OMPI only with the --prefix to specify my location of
install folder which is /home/$USER/.openmpi.
Users on both machines are identical.

On both machines running mpirun on one of the example programs work
perfectly, on both machines I've added bin and lib to corresponding PATHs,
env command ran on master and through ssh on slave1 gives these results:
osboxes@osboxes:~/cloud$ env | grep PATH
LD_LIBRARY_PATH=:/usr/local/lib:/usr/local/lib:/home/osboxes/.openmpi/lib
PATH=/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/bin:/usr/local/bin:/home/osboxes/.openmpi/bin

On both hosts I have installed ssh which works, I've enabled passwordless
connection, which again, works,

I have created an NFS folder in which I hold the hello_c program from
examples and hosts file that looks like this:
192.168.0.191master
192.168.0.190slave1

The same two lines are placed in /etc/hosts for ssh to work.

On both machines, running:
osboxes@osboxes:~/cloud$ mpirun -np 1 ./hello_c
Hello, world, I am 0 of 1, (Open MPI v1.10.2, package: Open MPI
osboxes@osboxes Distribution, ident: 1.10.2, repo rev:
v1.10.1-145-g799148f, Jan 21, 2016, 126)

Works, even running this command on slave1 through ssh from master works as
expected.

Yet when I try to execute the following command I get the error:
osboxes@osboxes:~/cloud$ mpirun -np 2 --hostfile hosts ./hello_c
bash: orted: command not found
--
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp
(--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).
--

The same happens when I use -host master,slave1 instead of --hostfile
hosts. I'm slowly running out if ideas, I've tried anything I could find on
the internet or OMPI FAQ, nothing seems to work. What am I doing wrong?


Re: [OMPI users] Problem with 'orted: command not found'

2016-05-02 Thread Jeff Squyres (jsquyres)
The key is this error:

bash: orted: command not found

Meaning: you need to set your PATH and LD_LIBRARY_PATH properly for 
non-interactive logins.  See 
https://www.open-mpi.org/faq/?category=running#adding-ompi-to-path.


> On May 2, 2016, at 5:36 PM, Maciek Lewiński  wrote:
> 
> Hi, I'm having problem with Open MPI version 1.10.2.
> I've installed two virtual machines on VirtualBox, both are the same images 
> of Ubuntu 12.04 64bit.
> Both have the same accounts, both have everything configured almost exactly 
> the same.
> I have configured OMPI only with the --prefix to specify my location of 
> install folder which is /home/$USER/.openmpi.
> Users on both machines are identical.
> 
> On both machines running mpirun on one of the example programs work 
> perfectly, on both machines I've added bin and lib to corresponding PATHs, 
> env command ran on master and through ssh on slave1 gives these results:
> osboxes@osboxes:~/cloud$ env | grep PATH
> LD_LIBRARY_PATH=:/usr/local/lib:/usr/local/lib:/home/osboxes/.openmpi/lib
> PATH=/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/bin:/usr/local/bin:/home/osboxes/.openmpi/bin
> 
> On both hosts I have installed ssh which works, I've enabled passwordless 
> connection, which again, works,
> 
> I have created an NFS folder in which I hold the hello_c program from 
> examples and hosts file that looks like this:
> 192.168.0.191master
> 192.168.0.190slave1
> 
> The same two lines are placed in /etc/hosts for ssh to work.
> 
> On both machines, running:
> osboxes@osboxes:~/cloud$ mpirun -np 1 ./hello_c
> Hello, world, I am 0 of 1, (Open MPI v1.10.2, package: Open MPI 
> osboxes@osboxes Distribution, ident: 1.10.2, repo rev: v1.10.1-145-g799148f, 
> Jan 21, 2016, 126)
> 
> Works, even running this command on slave1 through ssh from master works as 
> expected.
> 
> Yet when I try to execute the following command I get the error: 
> osboxes@osboxes:~/cloud$ mpirun -np 2 --hostfile hosts ./hello_c
> bash: orted: command not found
> --
> ORTE was unable to reliably start one or more daemons.
> This usually is caused by:
> 
> * not finding the required libraries and/or binaries on
>   one or more nodes. Please check your PATH and LD_LIBRARY_PATH
>   settings, or configure OMPI with --enable-orterun-prefix-by-default
> 
> * lack of authority to execute on one or more specified nodes.
>   Please verify your allocation and authorities.
> 
> * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
>   Please check with your sys admin to determine the correct location to use.
> 
> *  compilation of the orted with dynamic libraries when static are required
>   (e.g., on Cray). Please check your configure cmd line and consider using
>   one of the contrib/platform definitions for your system type.
> 
> * an inability to create a connection back to mpirun due to a
>   lack of common network interfaces and/or no route found between
>   them. Please check network connectivity (including firewalls
>   and network routing requirements).
> --
> 
> The same happens when I use -host master,slave1 instead of --hostfile hosts. 
> I'm slowly running out if ideas, I've tried anything I could find on the 
> internet or OMPI FAQ, nothing seems to work. What am I doing wrong?
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29068.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] Problem with 'orted: command not found'

2016-05-02 Thread Maciek Lewiński
I already had correct paths in .bashrc:

export
PATH=/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/bin:/usr/local/bin:/home/$USER/.openmpi/bin

export
LD_LIBRARY_PATH=:/usr/local/lib:/usr/local/lib:/home/$USER/.openmpi/lib

I can run mpi normally from slave1 so I'm sure they work. I also exported
these paths in .profile just to be sure a moment before but it didn't work.
Still the same error.



2016-05-02 23:40 GMT+02:00 Jeff Squyres (jsquyres) :

> The key is this error:
>
> bash: orted: command not found
>
> Meaning: you need to set your PATH and LD_LIBRARY_PATH properly for
> non-interactive logins.  See
> https://www.open-mpi.org/faq/?category=running#adding-ompi-to-path.
>
>
> > On May 2, 2016, at 5:36 PM, Maciek Lewiński 
> wrote:
> >
> > Hi, I'm having problem with Open MPI version 1.10.2.
> > I've installed two virtual machines on VirtualBox, both are the same
> images of Ubuntu 12.04 64bit.
> > Both have the same accounts, both have everything configured almost
> exactly the same.
> > I have configured OMPI only with the --prefix to specify my location of
> install folder which is /home/$USER/.openmpi.
> > Users on both machines are identical.
> >
> > On both machines running mpirun on one of the example programs work
> perfectly, on both machines I've added bin and lib to corresponding PATHs,
> env command ran on master and through ssh on slave1 gives these results:
> > osboxes@osboxes:~/cloud$ env | grep PATH
> > LD_LIBRARY_PATH=:/usr/local/lib:/usr/local/lib:/home/osboxes/.openmpi/lib
> >
> PATH=/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/bin:/usr/local/bin:/home/osboxes/.openmpi/bin
> >
> > On both hosts I have installed ssh which works, I've enabled
> passwordless connection, which again, works,
> >
> > I have created an NFS folder in which I hold the hello_c program from
> examples and hosts file that looks like this:
> > 192.168.0.191master
> > 192.168.0.190slave1
> >
> > The same two lines are placed in /etc/hosts for ssh to work.
> >
> > On both machines, running:
> > osboxes@osboxes:~/cloud$ mpirun -np 1 ./hello_c
> > Hello, world, I am 0 of 1, (Open MPI v1.10.2, package: Open MPI
> osboxes@osboxes Distribution, ident: 1.10.2, repo rev:
> v1.10.1-145-g799148f, Jan 21, 2016, 126)
> >
> > Works, even running this command on slave1 through ssh from master works
> as expected.
> >
> > Yet when I try to execute the following command I get the error:
> > osboxes@osboxes:~/cloud$ mpirun -np 2 --hostfile hosts ./hello_c
> > bash: orted: command not found
> >
> --
> > ORTE was unable to reliably start one or more daemons.
> > This usually is caused by:
> >
> > * not finding the required libraries and/or binaries on
> >   one or more nodes. Please check your PATH and LD_LIBRARY_PATH
> >   settings, or configure OMPI with --enable-orterun-prefix-by-default
> >
> > * lack of authority to execute on one or more specified nodes.
> >   Please verify your allocation and authorities.
> >
> > * the inability to write startup files into /tmp
> (--tmpdir/orte_tmpdir_base).
> >   Please check with your sys admin to determine the correct location to
> use.
> >
> > *  compilation of the orted with dynamic libraries when static are
> required
> >   (e.g., on Cray). Please check your configure cmd line and consider
> using
> >   one of the contrib/platform definitions for your system type.
> >
> > * an inability to create a connection back to mpirun due to a
> >   lack of common network interfaces and/or no route found between
> >   them. Please check network connectivity (including firewalls
> >   and network routing requirements).
> >
> --
> >
> > The same happens when I use -host master,slave1 instead of --hostfile
> hosts. I'm slowly running out if ideas, I've tried anything I could find on
> the internet or OMPI FAQ, nothing seems to work. What am I doing wrong?
> > ___
> > users mailing list
> > us...@open-mpi.org
> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/05/29068.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/05/29069.php


Re: [OMPI users] Problem with 'orted: command not found'

2016-05-02 Thread Jeff Squyres (jsquyres)
Make sure you check that these paths are set for *non-interactive* logins.


> On May 2, 2016, at 6:14 PM, Maciek Lewiński  wrote:
> 
> I already had correct paths in .bashrc:
> 
> export 
> PATH=/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/bin:/usr/local/bin:/home/$USER/.openmpi/bin
> 
> export LD_LIBRARY_PATH=:/usr/local/lib:/usr/local/lib:/home/$USER/.openmpi/lib
> 
> I can run mpi normally from slave1 so I'm sure they work. I also exported 
> these paths in .profile just to be sure a moment before but it didn't work. 
> Still the same error.
> 
> 
> 
> 2016-05-02 23:40 GMT+02:00 Jeff Squyres (jsquyres) :
> The key is this error:
> 
> bash: orted: command not found
> 
> Meaning: you need to set your PATH and LD_LIBRARY_PATH properly for 
> non-interactive logins.  See 
> https://www.open-mpi.org/faq/?category=running#adding-ompi-to-path.
> 
> 
> > On May 2, 2016, at 5:36 PM, Maciek Lewiński  
> > wrote:
> >
> > Hi, I'm having problem with Open MPI version 1.10.2.
> > I've installed two virtual machines on VirtualBox, both are the same images 
> > of Ubuntu 12.04 64bit.
> > Both have the same accounts, both have everything configured almost exactly 
> > the same.
> > I have configured OMPI only with the --prefix to specify my location of 
> > install folder which is /home/$USER/.openmpi.
> > Users on both machines are identical.
> >
> > On both machines running mpirun on one of the example programs work 
> > perfectly, on both machines I've added bin and lib to corresponding PATHs, 
> > env command ran on master and through ssh on slave1 gives these results:
> > osboxes@osboxes:~/cloud$ env | grep PATH
> > LD_LIBRARY_PATH=:/usr/local/lib:/usr/local/lib:/home/osboxes/.openmpi/lib
> > PATH=/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/bin:/usr/local/bin:/home/osboxes/.openmpi/bin
> >
> > On both hosts I have installed ssh which works, I've enabled passwordless 
> > connection, which again, works,
> >
> > I have created an NFS folder in which I hold the hello_c program from 
> > examples and hosts file that looks like this:
> > 192.168.0.191master
> > 192.168.0.190slave1
> >
> > The same two lines are placed in /etc/hosts for ssh to work.
> >
> > On both machines, running:
> > osboxes@osboxes:~/cloud$ mpirun -np 1 ./hello_c
> > Hello, world, I am 0 of 1, (Open MPI v1.10.2, package: Open MPI 
> > osboxes@osboxes Distribution, ident: 1.10.2, repo rev: 
> > v1.10.1-145-g799148f, Jan 21, 2016, 126)
> >
> > Works, even running this command on slave1 through ssh from master works as 
> > expected.
> >
> > Yet when I try to execute the following command I get the error:
> > osboxes@osboxes:~/cloud$ mpirun -np 2 --hostfile hosts ./hello_c
> > bash: orted: command not found
> > --
> > ORTE was unable to reliably start one or more daemons.
> > This usually is caused by:
> >
> > * not finding the required libraries and/or binaries on
> >   one or more nodes. Please check your PATH and LD_LIBRARY_PATH
> >   settings, or configure OMPI with --enable-orterun-prefix-by-default
> >
> > * lack of authority to execute on one or more specified nodes.
> >   Please verify your allocation and authorities.
> >
> > * the inability to write startup files into /tmp 
> > (--tmpdir/orte_tmpdir_base).
> >   Please check with your sys admin to determine the correct location to use.
> >
> > *  compilation of the orted with dynamic libraries when static are required
> >   (e.g., on Cray). Please check your configure cmd line and consider using
> >   one of the contrib/platform definitions for your system type.
> >
> > * an inability to create a connection back to mpirun due to a
> >   lack of common network interfaces and/or no route found between
> >   them. Please check network connectivity (including firewalls
> >   and network routing requirements).
> > --
> >
> > The same happens when I use -host master,slave1 instead of --hostfile 
> > hosts. I'm slowly running out if ideas, I've tried anything I could find on 
> > the internet or OMPI FAQ, nothing seems to work. What am I doing wrong?
> > ___
> > users mailing list
> > us...@open-mpi.org
> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/users/2016/05/29068.php
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29069.php
> 
> 

Re: [OMPI users] Problem with 'orted: command not found'

2016-05-02 Thread Gilles Gouaillardet
If OpenMPI is installed at the same path on every node, the easiest optin
is to re-configure with
--enable-mpirun-prefix-by-default
an other option is to use
`which mpirun` instead of mpirun
and yet an other option is to
mpirun --prefix=$USER/.openmpi

Cheers,

Gilles

On Tuesday, May 3, 2016, Maciek Lewiński  wrote:

> I already had correct paths in .bashrc:
>
> export
> PATH=/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/bin:/usr/local/bin:/home/$USER/.openmpi/bin
>
> export
> LD_LIBRARY_PATH=:/usr/local/lib:/usr/local/lib:/home/$USER/.openmpi/lib
>
> I can run mpi normally from slave1 so I'm sure they work. I also exported
> these paths in .profile just to be sure a moment before but it didn't work.
> Still the same error.
>
>
>
> 2016-05-02 23:40 GMT+02:00 Jeff Squyres (jsquyres)  >:
>
>> The key is this error:
>>
>> bash: orted: command not found
>>
>> Meaning: you need to set your PATH and LD_LIBRARY_PATH properly for
>> non-interactive logins.  See
>> https://www.open-mpi.org/faq/?category=running#adding-ompi-to-path.
>>
>>
>> > On May 2, 2016, at 5:36 PM, Maciek Lewiński > > wrote:
>> >
>> > Hi, I'm having problem with Open MPI version 1.10.2.
>> > I've installed two virtual machines on VirtualBox, both are the same
>> images of Ubuntu 12.04 64bit.
>> > Both have the same accounts, both have everything configured almost
>> exactly the same.
>> > I have configured OMPI only with the --prefix to specify my location of
>> install folder which is /home/$USER/.openmpi.
>> > Users on both machines are identical.
>> >
>> > On both machines running mpirun on one of the example programs work
>> perfectly, on both machines I've added bin and lib to corresponding PATHs,
>> env command ran on master and through ssh on slave1 gives these results:
>> > osboxes@osboxes:~/cloud$ env | grep PATH
>> >
>> LD_LIBRARY_PATH=:/usr/local/lib:/usr/local/lib:/home/osboxes/.openmpi/lib
>> >
>> PATH=/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/bin:/usr/local/bin:/home/osboxes/.openmpi/bin
>> >
>> > On both hosts I have installed ssh which works, I've enabled
>> passwordless connection, which again, works,
>> >
>> > I have created an NFS folder in which I hold the hello_c program from
>> examples and hosts file that looks like this:
>> > 192.168.0.191master
>> > 192.168.0.190slave1
>> >
>> > The same two lines are placed in /etc/hosts for ssh to work.
>> >
>> > On both machines, running:
>> > osboxes@osboxes:~/cloud$ mpirun -np 1 ./hello_c
>> > Hello, world, I am 0 of 1, (Open MPI v1.10.2, package: Open MPI
>> osboxes@osboxes Distribution, ident: 1.10.2, repo rev:
>> v1.10.1-145-g799148f, Jan 21, 2016, 126)
>> >
>> > Works, even running this command on slave1 through ssh from master
>> works as expected.
>> >
>> > Yet when I try to execute the following command I get the error:
>> > osboxes@osboxes:~/cloud$ mpirun -np 2 --hostfile hosts ./hello_c
>> > bash: orted: command not found
>> >
>> --
>> > ORTE was unable to reliably start one or more daemons.
>> > This usually is caused by:
>> >
>> > * not finding the required libraries and/or binaries on
>> >   one or more nodes. Please check your PATH and LD_LIBRARY_PATH
>> >   settings, or configure OMPI with --enable-orterun-prefix-by-default
>> >
>> > * lack of authority to execute on one or more specified nodes.
>> >   Please verify your allocation and authorities.
>> >
>> > * the inability to write startup files into /tmp
>> (--tmpdir/orte_tmpdir_base).
>> >   Please check with your sys admin to determine the correct location to
>> use.
>> >
>> > *  compilation of the orted with dynamic libraries when static are
>> required
>> >   (e.g., on Cray). Please check your configure cmd line and consider
>> using
>> >   one of the contrib/platform definitions for your system type.
>> >
>> > * an inability to create a connection back to mpirun due to a
>> >   lack of common network interfaces and/or no route found between
>> >   them. Please check network connectivity (including firewalls
>> >   and network routing requirements).
>> >
>> --
>> >
>> > The same happens when I use -host master,slave1 instead of --hostfile
>> hosts. I'm slowly running out if ideas, I've tried anything I could find on
>> the internet or OMPI FAQ, nothing seems to work. What am I doing wrong?
>> > ___
>> > users mailing list
>> > us...@open-mpi.org 
>> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>> > Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/05/29068.php
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com 
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>> ___