date:20120516

Re: [OMPI users] MPI-IO puzzlement

2012-05-16 Thread Ricardo Reis



 Dear all

 I think I find the culprit.

 I was calculating my offset using

 offset = my_rank*dim

 where dim is the array size. Both my_rank and dim are normal integers and 
here lies the rub.


 Fortran (or should I say gfortran?) multiplies my_rank*dim in integer*4 
and then converts to integer*8 (the offset type). Basically everything is 
lost then because the value converted is already out of range. I've done a 
small programme to test this and doing


 offset = INT(my_rank,8)*INT(dim,8)

 works fine (in the test programme, I'm still checking if there is another 
problem on the other one)


 Anyway, although is becomes obvious after tracking it I think it can be a 
normal pitfall for the unaware...


 best,

 Ricardo Reis

 'Non Serviam'

 PhD/MSc Mechanical Engineering | Lic. Aerospace Engineering

 Computational Fluid Dynamics, High Performance Computing, Turbulence
 http://www.lasef.ist.utl.pt

 Cultural Instigator @ Rádio Zero
 http://www.radiozero.pt

 http://www.flickr.com/photos/rreis/

 contacts:  gtalk: kyriu...@gmail.com  skype: kyriusan

 Institutional Address:

 Ricardo J.N. dos Reis
 IDMEC, Instituto Superior Técnico, Technical University of Lisbon
 Av. Rovisco Pais
 1049-001 Lisboa
 Portugal

  - email sent with alpine 2.00 -

[OMPI users] Returned "Unreachable" (-12) instead of "Success" (0)

2012-05-16 Thread devendra rai

Hello All,


I am trying to run an OpenMPI application across two physical machines. 

I get an error "Returned "Unreachable" (-12) instead of "Success" (0)", and 
looking through the logs (attached), I cannot seem to find out the cause, and 
how to fix it.

I see lot of (communication) components being loaded and then unloaded, and I 
do not see which nodes pick up what kind of comm-interface.

--
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[10782,1],6]) is on host: tik34x
  Process 2 ([[10782,1],0]) is on host: tik33x
  BTLs attempted: self sm tcp

Your MPI job is now going to abort; sorry.

The "mpirun" line is:

mpirun --mca btl self,sm,tcp --mca btl_base_verbose 30 -report-pid -display-map 
-report-bindings -hostfile hostfile -np 7 -v --rankfile rankfile.txt -v 
--timestamp-output --tag-output ./xstartwrapper.sh ./run_gdb.sh  

where the .sh files are fixes for forwarding X-windows from multiple machines 
to the machines where I am logged in.

Can anyone help?

Thanks a lot.

Best,

Devendra--- Begin Message ---

Hello All,

I am trying to run an OpenMPI application across two physical machines. 

I get an error "Returned "Unreachable" (-12) instead of "Success" (0)", and 
looking through the logs (attached), I cannot seem to find out the cause, and 
how to fix it.

I see lot of (communication) components being loaded and then unloaded, and I 
do not see which nodes pick up what kind of comm-interface.

--
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[10782,1],6]) is on host: tik34x
  Process 2 ([[10782,1],0]) is on host: tik33x
  BTLs attempted: self sm tcp

Your MPI job is now going to abort; sorry.

The "mpirun" line is:

mpirun --mca btl self,sm,tcp --mca btl_base_verbose 30 -report-pid -display-map 
-report-bindings -hostfile hostfile -np 7 -v --rankfile rankfile.txt -v 
--timestamp-output --tag-output ./xstartwrapper.sh ./run_gdb.sh  

where the .sh files are fixes for forwarding X-windows from multiple machines 
to the machines where I am logged in.

Can anyone help?

Thanks a lot.

Best,

Devendrareset: standard error: Invalid argument

Destination is: r...@tik33x.ethz.ch
Host is: tik33x
Destination is: r...@tik34x.ethz.ch
Host is: tik34x
-
It seems that there is no lamd running on the host tik33x.

This indicates that the LAM/MPI runtime environment is not operating.
The LAM/MPI runtime environment is necessary for the "lamhalt" command.

Please run the "lamboot" command the start the LAM/MPI runtime
environment.  See the LAM/MPI documentation for how to invoke
"lamboot" across multiple machines.
-

LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University

n-1<11170> ssi:boot:open: opening
n-1<11170> ssi:boot:open: opening boot module globus
n-1<11170> ssi:boot:open: opened boot module globus
n-1<11170> ssi:boot:open: opening boot module rsh
n-1<11170> ssi:boot:open: opened boot module rsh
n-1<11170> ssi:boot:open: opening boot module slurm
n-1<11170> ssi:boot:open: opened boot module slurm
n-1<11170> ssi:boot:select: initializing boot module slurm
n-1<11170> ssi:boot:slurm: not running under SLURM
n-1<11170> ssi:boot:select: boot module not available: slurm
n-1<11170> ssi:boot:select: initializing boot module rsh
n-1<11170> ssi:boot:rsh: module initializing
n-1<11170> ssi:boot:rsh:agent: /usr/bin/rsh
n-1<11170> ssi:boot:rsh:username: 
n-1<11170> ssi:boot:rsh:verbose: 1000
n-1<11170> ssi:boot:rsh:algorithm: linear
n-1<11170> ssi:boot:rsh:no_n: 0
n-1<11170> ssi:boot:rsh:no_profile: 0
n-1<11170> ssi:boot:rsh:fast: 0
n-1<11170> ssi:boot:rsh:ignore_stderr: 0
n-1<11170> ssi:boot:rsh:priority: 10
n-1<11170> ssi:boot:select: boot module available: rsh, priority: 10
n-1<11170> ssi:boot:select: initializing boot module globus
n-1<11170> ssi:boot:globus: globus-job-run not found, globus boot will not run
n-1<11170> ssi:boot:select: boot module not available: globus
n-1<11170> ssi:boot:select: finalizing boot module slurm
n-1<11170> ssi:boot:slurm: finalizing
n-1<11170> ssi:boot:select: closing boot module slurm
n-1<11170> ssi:boot:select: finalizing boot

[OMPI users] MPI - memory sharing?

2012-05-16 Thread Rohan Deshpande

I have following structure of  MPI code -

int main(){

MPI_INIT.
//initialize MPI
data = malloc(sizeof(int)*200);
//initialize data
   /*Master-*/
  if(taskid == 0){

//send data to each slave
MPI_SEND
   }

   /*Slaves---*/
  if(taskid > 0){

   //accept data from master
  MPI_RECV
  //do some calculations
 }

 MPI_Finalize()
}

I have few doubts about the code above.
In above code, the data is allocated memory in the main program. If I run
this program on cluster where
node 1 is slave and node 0 is master, does node 1 actually shares the
memory location of node 0 to perform the calculations?

If I do not want to share the memory, how can i make task on node 1 work
independently ?

Thanks in advance.

[OMPI users] fork() warning message

2012-05-16 Thread Jim Maas

I'm getting the following error with a new version of R, using Rmpi and 
a few other modules.  I've already had a couple of good suggestions from 
this group about how to diagnose the cause of the fork error using 
"strace" but we don't have it on our LSF Linux cluster.  This is my 
first use of R/mpi/parallel etc so am a bit naive.  Also the code I'm 
running involves random number generation so will always give slightly 
different answers.


My normal routine is to :
a) try the code with a small number of iterations on my own 
Linux/R/open-mpi   pc using 8 cores, then

b) make the job bigger and run it to the cluster.

I only get the warning on the cluster which suggests that it caused by 
something related to R and/or Rmpi and/or LSF and/or open MPI  ???


Could someone suggest some rigorous R test-code that I could run on my 
pc,  ok if it takes some time, and then rerun it on cluster to confirm 
that I get the same results, and thus the warning in inconsequential?


Thanks

Jim

=

An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.

The process that invoked fork was:

   Local host:  cn159.private.dns.zone (PID 12792)
   MPI_COMM_WORLD rank: 7

If you are*absolutely sure*  that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.


--
Dr. Jim Maas
University of East Anglia

Re: [OMPI users] Returned "Unreachable" (-12) instead of "Success" (0)

2012-05-16 Thread Ralph Castain

Looks like you have a firewall between hosts tik34x and tik33x - you might 
check to ensure all firewalls are disabled. The error is saying it can't open a 
TCP socket between the two nodes, so there is no communication path between 
those two processes.


On May 16, 2012, at 4:22 AM, devendra rai wrote:

> Hello All,
> 
> I am trying to run an OpenMPI application across two physical machines. 
> 
> I get an error "Returned "Unreachable" (-12) instead of "Success" (0)", and 
> looking through the logs (attached), I cannot seem to find out the cause, and 
> how to fix it.
> 
> I see lot of (communication) components being loaded and then unloaded, and I 
> do not see which nodes pick up what kind of comm-interface.
> 
> --
> At least one pair of MPI processes are unable to reach each other for
> MPI communications.  This means that no Open MPI device has indicated
> that it can be used to communicate between these processes.  This is
> an error; Open MPI requires that all MPI processes be able to reach
> each other.  This error can sometimes be the result of forgetting to
> specify the "self" BTL.
> 
>   Process 1 ([[10782,1],6]) is on host: tik34x
>   Process 2 ([[10782,1],0]) is on host: tik33x
>   BTLs attempted: self sm tcp
> 
> Your MPI job is now going to abort; sorry.
> 
> The "mpirun" line is:
> 
> mpirun --mca btl self,sm,tcp --mca btl_base_verbose 30 -report-pid 
> -display-map -report-bindings -hostfile hostfile -np 7 -v --rankfile 
> rankfile.txt -v --timestamp-output --tag-output ./xstartwrapper.sh 
> ./run_gdb.sh  
> 
> where the .sh files are fixes for forwarding X-windows from multiple machines 
> to the machines where I am logged in.
> 
> Can anyone help?
> 
> Thanks a lot.
> 
> Best,
> 
> Devendra
> 
> 
> 
> From: devendra rai 
> Subject: Returned "Unreachable" (-12) instead of "Success" (0)
> Date: May 16, 2012 4:18:28 AM MDT
> To: Open MPI Users 
> Reply-To: devendra rai 
> 
> 
> 
> Hello All,
> 
> I am trying to run an OpenMPI application across two physical machines. 
> 
> I get an error "Returned "Unreachable" (-12) instead of "Success" (0)", and 
> looking through the logs (attached), I cannot seem to find out the cause, and 
> how to fix it.
> 
> I see lot of (communication) components being loaded and then unloaded, and I 
> do not see which nodes pick up what kind of comm-interface.
> 
> --
> At least one pair of MPI processes are unable to reach each other for
> MPI communications.  This means that no Open MPI device has indicated
> that it can be used to communicate between these processes.  This is
> an error; Open MPI requires that all MPI processes be able to reach
> each other.  This error can sometimes be the result of forgetting to
> specify the "self" BTL.
> 
>   Process 1 ([[10782,1],6]) is on host: tik34x
>   Process 2 ([[10782,1],0]) is on host: tik33x
>   BTLs attempted: self sm tcp
> 
> Your MPI job is now going to abort; sorry.
> 
> The "mpirun" line is:
> 
> mpirun --mca btl self,sm,tcp --mca btl_base_verbose 30 -report-pid 
> -display-map -report-bindings -hostfile hostfile -np 7 -v --rankfile 
> rankfile.txt -v --timestamp-output --tag-output ./xstartwrapper.sh 
> ./run_gdb.sh  
> 
> where the .sh files are fixes for forwarding X-windows from multiple machines 
> to the machines where I am logged in.
> 
> Can anyone help?
> 
> Thanks a lot.
> 
> Best,
> 
> Devendra
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] fork() warning message

2012-05-16 Thread Ralph Castain

We've had a few reports of this - it looks like someone made a change to R that 
can cause problems. Basically, the open fabrics driver for Infiniband doesn't 
support "fork" operations - it can lead to memory corruption issues if you 
inadvertently do the "wrong thing" at some point after the fork. Hence, we emit 
a warning if we see a "fork" operation when Infiniband is being used with the 
OFED verbs driver.

You can suppress the warning by setting -mca mpi_warn_on_fork 0 on your cmd 
line. You will probably be okay, but just be aware you could hit issues.

On May 16, 2012, at 6:17 AM, Jim Maas wrote:

> I'm getting the following error with a new version of R, using Rmpi and a few 
> other modules.  I've already had a couple of good suggestions from this group 
> about how to diagnose the cause of the fork error using "strace" but we don't 
> have it on our LSF Linux cluster.  This is my first use of R/mpi/parallel etc 
> so am a bit naive.  Also the code I'm running involves random number 
> generation so will always give slightly different answers.  
> 
> My normal routine is to :
> a) try the code with a small number of iterations on my own 
> Linux/R/open-mpi   pc using 8 cores, then
> b) make the job bigger and run it to the cluster.
> 
> I only get the warning on the cluster which suggests that it caused by 
> something related to R and/or Rmpi and/or LSF and/or open MPI  ???
> 
> Could someone suggest some rigorous R test-code that I could run on my pc,  
> ok if it takes some time, and then rerun it on cluster to confirm that I get 
> the same results, and thus the warning in inconsequential?
> 
> Thanks
> 
> Jim
> 
> =
> An MPI process has executed an operation involving a call to the
> "fork()" system call to create a child process.  Open MPI is currently
> operating in a condition that could result in memory corruption or
> other system errors; your MPI job may hang, crash, or produce silent
> data corruption.  The use of fork() (or system() or other calls that
> create child processes) is strongly discouraged.
> 
> The process that invoked fork was:
> 
>Local host:  cn159.private.dns.zone (PID 12792)
>MPI_COMM_WORLD rank: 7
> 
> If you are *absolutely sure* that your application will successfully
> and correctly survive a call to fork(), you may disable this warning
> by setting the mpi_warn_on_fork MCA parameter to 0.
> 
> -- 
> Dr. Jim Maas
> University of East Anglia
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] MPI-IO puzzlement

2012-05-16 Thread Ricardo Reis



 all problems gone, thanks for the input and assistance.

 cheers,

 Ricardo Reis

 'Non Serviam'

 PhD/MSc Mechanical Engineering | Lic. Aerospace Engineering

 Computational Fluid Dynamics, High Performance Computing, Turbulence
 http://www.lasef.ist.utl.pt

 Cultural Instigator @ Rádio Zero
 http://www.radiozero.pt

 http://www.flickr.com/photos/rreis/

 contacts:  gtalk: kyriu...@gmail.com  skype: kyriusan

 Institutional Address:

 Ricardo J.N. dos Reis
 IDMEC, Instituto Superior Técnico, Technical University of Lisbon
 Av. Rovisco Pais
 1049-001 Lisboa
 Portugal

  - email sent with alpine 2.00 -

Re: [OMPI users] MPI-IO puzzlement

2012-05-16 Thread Jeff Squyres

Excellent.  The bug fix will be in 1.6.1, too.

On May 16, 2012, at 1:26 PM, Ricardo Reis wrote:

> 
> all problems gone, thanks for the input and assistance.
> 
> cheers,
> 
> Ricardo Reis
> 
> 'Non Serviam'
> 
> PhD/MSc Mechanical Engineering | Lic. Aerospace Engineering
> 
> Computational Fluid Dynamics, High Performance Computing, Turbulence
> http://www.lasef.ist.utl.pt
> 
> Cultural Instigator @ Rádio Zero
> http://www.radiozero.pt
> 
> http://www.flickr.com/photos/rreis/
> 
> contacts:  gtalk: kyriu...@gmail.com  skype: kyriusan
> 
> Institutional Address:
> 
> Ricardo J.N. dos Reis
> IDMEC, Instituto Superior Técnico, Technical University of Lisbon
> Av. Rovisco Pais
> 1049-001 Lisboa
> Portugal
> 
>  - email sent with alpine 2.00 
> -___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

[OMPI users] MPI_THREAD_MULTIPLE support not available (OpenMPI-1.4.4)

2012-05-16 Thread devendra rai

Hello Community,

I just finished building an OpenMPI version with custom configuration 
(1.4.4) on a server. I do not have root privileges on the server. 


The server already had a "default" version on OpenMPI, which seemed to be 
single-threaded:

ompi_info | grep Thread yields  Thread support: posix (mpi: no, progress: no)

For my custom build, I have MPI thread support (precisely, support for 
MPI_THREAD_MULTIPLE):

/path/to/Custom/OpenMPI/bin/ompi_info | grep Thread yields Thread support: 
posix (mpi: yes, progress: no)


Thus, I run my application as:

/path/to/Custom/OpenMPI/bin/mpirun [options]...

In my application, I do make a check for whether I am getting 
MPI_THREAD_MULTIPLE support or not:

MPI_Init_thread(&argc, &argv, desired_thread_support,
  &provided_thread_support);/*Need provided_thread_support = 
desired_thread_support*/

  printf ("MPI: Requested thread support: %d, MPI thread support received: 
%d\n", MPI_THREAD_MULTIPLE, provided_thread_support);


When I run the application, I get:

MPI: Requested thread support: 3, MPI thread support received: 0


I have tried lot of things, and I am not sure why this is the case.


At the beginning of the shell script which contains 
/path/to/Custom/OpenMPI/bin/mpirun [options]..., I set the paths, and 
LD_LIBRARY_PATH:

PATH=/home/raid/private/Customizations_NODELETE/OpenMPI-1.4.4/bin:$PATH
export 
LD_LIBRARY_PATH=/home/raid/private/Customizations_NODELETE/gcc-4.6/lib64:/home/Morpheus/.CustomInstalledPrograms/OpenMPI-1.4.4/lib:/home/raid/private/Customizations_NODELETE/gcc-4.6/lib32


So, I think that the correct version of mpirun is being used. 


Can someone tell me why I cannot get MPI_THREAD_MULTIPLE support?

I have the same OpenMPI version on my notebook, which does provide me with 
MPI_THREAD_MULTIPLE. 


Thanks a lot.

Best,
Devendra

Re: [OMPI users] MPI-IO puzzlement

[OMPI users] Returned "Unreachable" (-12) instead of "Success" (0)

[OMPI users] MPI - memory sharing?

[OMPI users] fork() warning message

Re: [OMPI users] Returned "Unreachable" (-12) instead of "Success" (0)

Re: [OMPI users] fork() warning message

Re: [OMPI users] MPI-IO puzzlement

Re: [OMPI users] MPI-IO puzzlement

[OMPI users] MPI_THREAD_MULTIPLE support not available (OpenMPI-1.4.4)

9 matches

Site Navigation

Mail list logo

Footer information