Re: [OMPI users] MPI-IO puzzlement
Dear all I think I find the culprit. I was calculating my offset using offset = my_rank*dim where dim is the array size. Both my_rank and dim are normal integers and here lies the rub. Fortran (or should I say gfortran?) multiplies my_rank*dim in integer*4 and then converts to integer*8 (the offset type). Basically everything is lost then because the value converted is already out of range. I've done a small programme to test this and doing offset = INT(my_rank,8)*INT(dim,8) works fine (in the test programme, I'm still checking if there is another problem on the other one) Anyway, although is becomes obvious after tracking it I think it can be a normal pitfall for the unaware... best, Ricardo Reis 'Non Serviam' PhD/MSc Mechanical Engineering | Lic. Aerospace Engineering Computational Fluid Dynamics, High Performance Computing, Turbulence http://www.lasef.ist.utl.pt Cultural Instigator @ Rádio Zero http://www.radiozero.pt http://www.flickr.com/photos/rreis/ contacts: gtalk: kyriu...@gmail.com skype: kyriusan Institutional Address: Ricardo J.N. dos Reis IDMEC, Instituto Superior Técnico, Technical University of Lisbon Av. Rovisco Pais 1049-001 Lisboa Portugal - email sent with alpine 2.00 -
[OMPI users] Returned "Unreachable" (-12) instead of "Success" (0)
Hello All, I am trying to run an OpenMPI application across two physical machines. I get an error "Returned "Unreachable" (-12) instead of "Success" (0)", and looking through the logs (attached), I cannot seem to find out the cause, and how to fix it. I see lot of (communication) components being loaded and then unloaded, and I do not see which nodes pick up what kind of comm-interface. -- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[10782,1],6]) is on host: tik34x Process 2 ([[10782,1],0]) is on host: tik33x BTLs attempted: self sm tcp Your MPI job is now going to abort; sorry. The "mpirun" line is: mpirun --mca btl self,sm,tcp --mca btl_base_verbose 30 -report-pid -display-map -report-bindings -hostfile hostfile -np 7 -v --rankfile rankfile.txt -v --timestamp-output --tag-output ./xstartwrapper.sh ./run_gdb.sh where the .sh files are fixes for forwarding X-windows from multiple machines to the machines where I am logged in. Can anyone help? Thanks a lot. Best, Devendra--- Begin Message --- Hello All, I am trying to run an OpenMPI application across two physical machines. I get an error "Returned "Unreachable" (-12) instead of "Success" (0)", and looking through the logs (attached), I cannot seem to find out the cause, and how to fix it. I see lot of (communication) components being loaded and then unloaded, and I do not see which nodes pick up what kind of comm-interface. -- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[10782,1],6]) is on host: tik34x Process 2 ([[10782,1],0]) is on host: tik33x BTLs attempted: self sm tcp Your MPI job is now going to abort; sorry. The "mpirun" line is: mpirun --mca btl self,sm,tcp --mca btl_base_verbose 30 -report-pid -display-map -report-bindings -hostfile hostfile -np 7 -v --rankfile rankfile.txt -v --timestamp-output --tag-output ./xstartwrapper.sh ./run_gdb.sh where the .sh files are fixes for forwarding X-windows from multiple machines to the machines where I am logged in. Can anyone help? Thanks a lot. Best, Devendrareset: standard error: Invalid argument Destination is: r...@tik33x.ethz.ch Host is: tik33x Destination is: r...@tik34x.ethz.ch Host is: tik34x - It seems that there is no lamd running on the host tik33x. This indicates that the LAM/MPI runtime environment is not operating. The LAM/MPI runtime environment is necessary for the "lamhalt" command. Please run the "lamboot" command the start the LAM/MPI runtime environment. See the LAM/MPI documentation for how to invoke "lamboot" across multiple machines. - LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University n-1<11170> ssi:boot:open: opening n-1<11170> ssi:boot:open: opening boot module globus n-1<11170> ssi:boot:open: opened boot module globus n-1<11170> ssi:boot:open: opening boot module rsh n-1<11170> ssi:boot:open: opened boot module rsh n-1<11170> ssi:boot:open: opening boot module slurm n-1<11170> ssi:boot:open: opened boot module slurm n-1<11170> ssi:boot:select: initializing boot module slurm n-1<11170> ssi:boot:slurm: not running under SLURM n-1<11170> ssi:boot:select: boot module not available: slurm n-1<11170> ssi:boot:select: initializing boot module rsh n-1<11170> ssi:boot:rsh: module initializing n-1<11170> ssi:boot:rsh:agent: /usr/bin/rsh n-1<11170> ssi:boot:rsh:username: n-1<11170> ssi:boot:rsh:verbose: 1000 n-1<11170> ssi:boot:rsh:algorithm: linear n-1<11170> ssi:boot:rsh:no_n: 0 n-1<11170> ssi:boot:rsh:no_profile: 0 n-1<11170> ssi:boot:rsh:fast: 0 n-1<11170> ssi:boot:rsh:ignore_stderr: 0 n-1<11170> ssi:boot:rsh:priority: 10 n-1<11170> ssi:boot:select: boot module available: rsh, priority: 10 n-1<11170> ssi:boot:select: initializing boot module globus n-1<11170> ssi:boot:globus: globus-job-run not found, globus boot will not run n-1<11170> ssi:boot:select: boot module not available: globus n-1<11170> ssi:boot:select: finalizing boot module slurm n-1<11170> ssi:boot:slurm: finalizing n-1<11170> ssi:boot:select: closing boot module slurm n-1<11170> ssi:boot:select: finalizing boot
[OMPI users] MPI - memory sharing?
I have following structure of MPI code - int main(){ MPI_INIT. //initialize MPI data = malloc(sizeof(int)*200); //initialize data /*Master-*/ if(taskid == 0){ //send data to each slave MPI_SEND } /*Slaves---*/ if(taskid > 0){ //accept data from master MPI_RECV //do some calculations } MPI_Finalize() } I have few doubts about the code above. In above code, the data is allocated memory in the main program. If I run this program on cluster where node 1 is slave and node 0 is master, does node 1 actually shares the memory location of node 0 to perform the calculations? If I do not want to share the memory, how can i make task on node 1 work independently ? Thanks in advance.
[OMPI users] fork() warning message
I'm getting the following error with a new version of R, using Rmpi and a few other modules. I've already had a couple of good suggestions from this group about how to diagnose the cause of the fork error using "strace" but we don't have it on our LSF Linux cluster. This is my first use of R/mpi/parallel etc so am a bit naive. Also the code I'm running involves random number generation so will always give slightly different answers. My normal routine is to : a) try the code with a small number of iterations on my own Linux/R/open-mpi pc using 8 cores, then b) make the job bigger and run it to the cluster. I only get the warning on the cluster which suggests that it caused by something related to R and/or Rmpi and/or LSF and/or open MPI ??? Could someone suggest some rigorous R test-code that I could run on my pc, ok if it takes some time, and then rerun it on cluster to confirm that I get the same results, and thus the warning in inconsequential? Thanks Jim = An MPI process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your MPI job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: cn159.private.dns.zone (PID 12792) MPI_COMM_WORLD rank: 7 If you are*absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -- Dr. Jim Maas University of East Anglia
Re: [OMPI users] Returned "Unreachable" (-12) instead of "Success" (0)
Looks like you have a firewall between hosts tik34x and tik33x - you might check to ensure all firewalls are disabled. The error is saying it can't open a TCP socket between the two nodes, so there is no communication path between those two processes. On May 16, 2012, at 4:22 AM, devendra rai wrote: > Hello All, > > I am trying to run an OpenMPI application across two physical machines. > > I get an error "Returned "Unreachable" (-12) instead of "Success" (0)", and > looking through the logs (attached), I cannot seem to find out the cause, and > how to fix it. > > I see lot of (communication) components being loaded and then unloaded, and I > do not see which nodes pick up what kind of comm-interface. > > -- > At least one pair of MPI processes are unable to reach each other for > MPI communications. This means that no Open MPI device has indicated > that it can be used to communicate between these processes. This is > an error; Open MPI requires that all MPI processes be able to reach > each other. This error can sometimes be the result of forgetting to > specify the "self" BTL. > > Process 1 ([[10782,1],6]) is on host: tik34x > Process 2 ([[10782,1],0]) is on host: tik33x > BTLs attempted: self sm tcp > > Your MPI job is now going to abort; sorry. > > The "mpirun" line is: > > mpirun --mca btl self,sm,tcp --mca btl_base_verbose 30 -report-pid > -display-map -report-bindings -hostfile hostfile -np 7 -v --rankfile > rankfile.txt -v --timestamp-output --tag-output ./xstartwrapper.sh > ./run_gdb.sh > > where the .sh files are fixes for forwarding X-windows from multiple machines > to the machines where I am logged in. > > Can anyone help? > > Thanks a lot. > > Best, > > Devendra > > > > From: devendra rai > Subject: Returned "Unreachable" (-12) instead of "Success" (0) > Date: May 16, 2012 4:18:28 AM MDT > To: Open MPI Users > Reply-To: devendra rai > > > > Hello All, > > I am trying to run an OpenMPI application across two physical machines. > > I get an error "Returned "Unreachable" (-12) instead of "Success" (0)", and > looking through the logs (attached), I cannot seem to find out the cause, and > how to fix it. > > I see lot of (communication) components being loaded and then unloaded, and I > do not see which nodes pick up what kind of comm-interface. > > -- > At least one pair of MPI processes are unable to reach each other for > MPI communications. This means that no Open MPI device has indicated > that it can be used to communicate between these processes. This is > an error; Open MPI requires that all MPI processes be able to reach > each other. This error can sometimes be the result of forgetting to > specify the "self" BTL. > > Process 1 ([[10782,1],6]) is on host: tik34x > Process 2 ([[10782,1],0]) is on host: tik33x > BTLs attempted: self sm tcp > > Your MPI job is now going to abort; sorry. > > The "mpirun" line is: > > mpirun --mca btl self,sm,tcp --mca btl_base_verbose 30 -report-pid > -display-map -report-bindings -hostfile hostfile -np 7 -v --rankfile > rankfile.txt -v --timestamp-output --tag-output ./xstartwrapper.sh > ./run_gdb.sh > > where the .sh files are fixes for forwarding X-windows from multiple machines > to the machines where I am logged in. > > Can anyone help? > > Thanks a lot. > > Best, > > Devendra > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] fork() warning message
We've had a few reports of this - it looks like someone made a change to R that can cause problems. Basically, the open fabrics driver for Infiniband doesn't support "fork" operations - it can lead to memory corruption issues if you inadvertently do the "wrong thing" at some point after the fork. Hence, we emit a warning if we see a "fork" operation when Infiniband is being used with the OFED verbs driver. You can suppress the warning by setting -mca mpi_warn_on_fork 0 on your cmd line. You will probably be okay, but just be aware you could hit issues. On May 16, 2012, at 6:17 AM, Jim Maas wrote: > I'm getting the following error with a new version of R, using Rmpi and a few > other modules. I've already had a couple of good suggestions from this group > about how to diagnose the cause of the fork error using "strace" but we don't > have it on our LSF Linux cluster. This is my first use of R/mpi/parallel etc > so am a bit naive. Also the code I'm running involves random number > generation so will always give slightly different answers. > > My normal routine is to : > a) try the code with a small number of iterations on my own > Linux/R/open-mpi pc using 8 cores, then > b) make the job bigger and run it to the cluster. > > I only get the warning on the cluster which suggests that it caused by > something related to R and/or Rmpi and/or LSF and/or open MPI ??? > > Could someone suggest some rigorous R test-code that I could run on my pc, > ok if it takes some time, and then rerun it on cluster to confirm that I get > the same results, and thus the warning in inconsequential? > > Thanks > > Jim > > = > An MPI process has executed an operation involving a call to the > "fork()" system call to create a child process. Open MPI is currently > operating in a condition that could result in memory corruption or > other system errors; your MPI job may hang, crash, or produce silent > data corruption. The use of fork() (or system() or other calls that > create child processes) is strongly discouraged. > > The process that invoked fork was: > >Local host: cn159.private.dns.zone (PID 12792) >MPI_COMM_WORLD rank: 7 > > If you are *absolutely sure* that your application will successfully > and correctly survive a call to fork(), you may disable this warning > by setting the mpi_warn_on_fork MCA parameter to 0. > > -- > Dr. Jim Maas > University of East Anglia > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] MPI-IO puzzlement
all problems gone, thanks for the input and assistance. cheers, Ricardo Reis 'Non Serviam' PhD/MSc Mechanical Engineering | Lic. Aerospace Engineering Computational Fluid Dynamics, High Performance Computing, Turbulence http://www.lasef.ist.utl.pt Cultural Instigator @ Rádio Zero http://www.radiozero.pt http://www.flickr.com/photos/rreis/ contacts: gtalk: kyriu...@gmail.com skype: kyriusan Institutional Address: Ricardo J.N. dos Reis IDMEC, Instituto Superior Técnico, Technical University of Lisbon Av. Rovisco Pais 1049-001 Lisboa Portugal - email sent with alpine 2.00 -
Re: [OMPI users] MPI-IO puzzlement
Excellent. The bug fix will be in 1.6.1, too. On May 16, 2012, at 1:26 PM, Ricardo Reis wrote: > > all problems gone, thanks for the input and assistance. > > cheers, > > Ricardo Reis > > 'Non Serviam' > > PhD/MSc Mechanical Engineering | Lic. Aerospace Engineering > > Computational Fluid Dynamics, High Performance Computing, Turbulence > http://www.lasef.ist.utl.pt > > Cultural Instigator @ Rádio Zero > http://www.radiozero.pt > > http://www.flickr.com/photos/rreis/ > > contacts: gtalk: kyriu...@gmail.com skype: kyriusan > > Institutional Address: > > Ricardo J.N. dos Reis > IDMEC, Instituto Superior Técnico, Technical University of Lisbon > Av. Rovisco Pais > 1049-001 Lisboa > Portugal > > - email sent with alpine 2.00 > -___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
[OMPI users] MPI_THREAD_MULTIPLE support not available (OpenMPI-1.4.4)
Hello Community, I just finished building an OpenMPI version with custom configuration (1.4.4) on a server. I do not have root privileges on the server. The server already had a "default" version on OpenMPI, which seemed to be single-threaded: ompi_info | grep Thread yields Thread support: posix (mpi: no, progress: no) For my custom build, I have MPI thread support (precisely, support for MPI_THREAD_MULTIPLE): /path/to/Custom/OpenMPI/bin/ompi_info | grep Thread yields Thread support: posix (mpi: yes, progress: no) Thus, I run my application as: /path/to/Custom/OpenMPI/bin/mpirun [options]... In my application, I do make a check for whether I am getting MPI_THREAD_MULTIPLE support or not: MPI_Init_thread(&argc, &argv, desired_thread_support, &provided_thread_support);/*Need provided_thread_support = desired_thread_support*/ printf ("MPI: Requested thread support: %d, MPI thread support received: %d\n", MPI_THREAD_MULTIPLE, provided_thread_support); When I run the application, I get: MPI: Requested thread support: 3, MPI thread support received: 0 I have tried lot of things, and I am not sure why this is the case. At the beginning of the shell script which contains /path/to/Custom/OpenMPI/bin/mpirun [options]..., I set the paths, and LD_LIBRARY_PATH: PATH=/home/raid/private/Customizations_NODELETE/OpenMPI-1.4.4/bin:$PATH export LD_LIBRARY_PATH=/home/raid/private/Customizations_NODELETE/gcc-4.6/lib64:/home/Morpheus/.CustomInstalledPrograms/OpenMPI-1.4.4/lib:/home/raid/private/Customizations_NODELETE/gcc-4.6/lib32 So, I think that the correct version of mpirun is being used. Can someone tell me why I cannot get MPI_THREAD_MULTIPLE support? I have the same OpenMPI version on my notebook, which does provide me with MPI_THREAD_MULTIPLE. Thanks a lot. Best, Devendra