Re: [OMPI users] OMPI error terminate w/o reasons

2011-03-26 Thread Ralph Castain
That command line cannot possibly work. Both the -rf and --output-filename options require arguments. PLEASE read the documentation? mpirun -h, or "man mpirun" will tell you how to correctly use these options. On Mar 26, 2011, at 6:35 PM, Jack Bryan wrote: > Hi, I used : > > mpirun -np 200

Re: [OMPI users] OMPI error terminate w/o reasons

2011-03-26 Thread Jack Bryan
Hi, I used : mpirun -np 200 -rf --output-filename /mypath/myapplication But, no files are printed out. Can "--debug" option help me hear ? When I tried : -bash-3.2$ mpirun -debug--A suitable debugger could not be found i

Re: [OMPI users] OMPI error terminate w/o reasons

2011-03-26 Thread Ralph Castain
If you use that mpirun option, mpirun will place the output from each rank into a -separate- file for you. Give it: mpirun --output-filename /myhome/debug/run01 and in /myhome/debug, you will find files: run01.0 run01.1 ... each with the output from the indicated rank. On Mar 26, 2011, at 3

Re: [OMPI users] Shared Memory Problem.

2011-03-26 Thread Michele Marena
Yes, It works fine without shared memory. I thank you Ralph. I will check the code for logical mistakes, otherwise I choose the option suggested by you. 2011/3/26 Ralph Castain > Your other option is to simply not use shared memory. TCP contains loopback > support, so you can run with just > > -

Re: [OMPI users] OMPI error terminate w/o reasons

2011-03-26 Thread Jack Bryan
The cluster can print out all output into one file. But, checking them for bugs is very hard. The cluster also print out possible error messages into one file. But, sometimes the error file is empty , sometimes it is signal 9. If I only run dummy tasks on worker nodes, no errors. If I run rea

Re: [OMPI users] OMPI error terminate w/o reasons

2011-03-26 Thread Ralph Castain
I don't know, but Ashley may be able to help - or you can see his web site for instructions. Alternatively, since you can put print statements into your code, have you considered using mpirun's option to direct output from each rank into its own file? Look at "mpirun -h" for the options. -o

Re: [OMPI users] Shared Memory Problem.

2011-03-26 Thread Ralph Castain
Your other option is to simply not use shared memory. TCP contains loopback support, so you can run with just -mca btl self,tcp and shared memory won't be used. It will run a tad slower that way, but at least your app will complete. On Mar 26, 2011, at 2:30 PM, Reuti wrote: > Am 26.03.2011 u

Re: [OMPI users] OMPI error terminate w/o reasons

2011-03-26 Thread Jack Bryan
Is it possible to enable padb to print out the stack trace and other program execute information into a file ? I can run the program in gdb as this: mpirun -np 200 -e gdb ./myapplication How to make gdb print out the debug information to a file ? So that I can check it when the program is term

Re: [OMPI users] Shared Memory Problem.

2011-03-26 Thread Reuti
Am 26.03.2011 um 21:16 schrieb Michele Marena: > No, I can't. I'm not a administrator of the cluster and I'm not the owner. You can recompile your private version of Open MPI and install it in $HOME/local/openmpi-1.4.3 or alike and set paths accordingly during compilation of your source and exe

Re: [OMPI users] Shared Memory Problem.

2011-03-26 Thread Michele Marena
No, I can't. I'm not a administrator of the cluster and I'm not the owner. 2011/3/26 Ralph Castain > Can you update to a more recent version? That version is several years > out-of-date - we don't even really support it any more. > > > On Mar 26, 2011, at 1:04 PM, Michele Marena wrote: > > Yes,

Re: [OMPI users] Shared Memory Problem.

2011-03-26 Thread Ralph Castain
Can you update to a more recent version? That version is several years out-of-date - we don't even really support it any more. On Mar 26, 2011, at 1:04 PM, Michele Marena wrote: > Yes, the syntax is wrong in the email, but I write it correctly when I launch > mpirun. When some communicating pr

Re: [OMPI users] OMPI error terminate w/o reasons

2011-03-26 Thread Ralph Castain
You don't need to install anything on a system folder - you can just install it in your home directory, assuming that is accessible on the remote nodes. As for the script - unless you can somehow modify it to allow you to run under a debugger, I am afraid you are completely out of luck. On Mar

Re: [OMPI users] Shared Memory Problem.

2011-03-26 Thread Michele Marena
Yes, the syntax is wrong in the email, but I write it correctly when I launch mpirun. When some communicating processes are on the same node the application don't terminate, otherwise the application terminate and its results are correct. My OpenMPI version is 1.2.7. 2011/3/26 Ralph Castain > >

Re: [OMPI users] OMPI error terminate w/o reasons

2011-03-26 Thread Jack Bryan
Hi, I am working on a cluster, where I am not allowed to install software on system folder. My Open MPI is 1.3.4. I have a very quick of the padb on http://padb.pittman.org.uk/ . Does it require some software install on the cluster in order to use it ? I cannot use command-line to run job on

Re: [OMPI users] Shared Memory Problem.

2011-03-26 Thread Ralph Castain
On Mar 26, 2011, at 11:34 AM, Michele Marena wrote: > Hi, > I've a problem with shared memory. When my application runs using pure > message passing (one process for node), it terminates and returns correct > results. When 2 processes share a node and use shared memory for exchanges > messages

Re: [OMPI users] OMPI error terminate w/o reasons

2011-03-26 Thread Ralph Castain
Have you tried a parallel debugger such as padb? On Mar 26, 2011, at 10:34 AM, Jack Bryan wrote: > Hi, > > I have tried this. But, the printout from 200 parallel processes make it > very hard to locate the possible bug. > > They may not stop at the same point when the program got signal 9. >

[OMPI users] Shared Memory Problem.

2011-03-26 Thread Michele Marena
Hi, I've a problem with shared memory. When my application runs using pure message passing (one process for node), it terminates and returns correct results. When 2 processes share a node and use shared memory for exchanges messages, my application don't terminate. At shell I write "mpirun -nolocal

Re: [OMPI users] OMPI error terminate w/o reasons

2011-03-26 Thread Jack Bryan
Hi, I have tried this. But, the printout from 200 parallel processes make it very hard to locate the possible bug. They may not stop at the same point when the program got signal 9. So, even though I can figure out the print out statements from all200 processes, so many different locations whe

Re: [OMPI users] OMPI error terminate w/o reasons

2011-03-26 Thread Ralph Castain
Try adding some print statements so you can see where the error occurs. On Mar 25, 2011, at 11:49 PM, Jack Bryan wrote: > Hi , All: > > I running a Open MPI (1.3.4) program by 200 parallel processes. > > But, the program is terminated with > > ---

[OMPI users] OMPI error terminate w/o reasons

2011-03-26 Thread Jack Bryan
Hi , All: I running a Open MPI (1.3.4) program by 200 parallel processes. But, the program is terminated with --mpirun noticed that process rank 0 with PID 77967 on node n342 exited on signal 9 (Killed).--