That command line cannot possibly work. Both the -rf and --output-filename 
options require arguments.

PLEASE read the documentation? mpirun -h, or "man mpirun" will tell you how to 
correctly use these options.


On Mar 26, 2011, at 6:35 PM, Jack Bryan wrote:

> Hi, I used : 
> 
>  mpirun -np 200 -rf  --output-filename /mypath/myapplication
> But, no files are printed out.
> 
> Can "--debug" option help me hear ? 
> 
> When I tried :
> 
> -bash-3.2$ mpirun -debug
> --------------------------------------------------------------------------
> A suitable debugger could not be found in your PATH.  Check the values
> specified in the orte_base_user_debugger MCA parameter for the list of
> debuggers that was searched.
> --------------------------------------------------------------------------
> Any help is really appreciated. 
> 
> thanks
> 
> From: r...@open-mpi.org
> Date: Sat, 26 Mar 2011 15:45:39 -0600
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] OMPI error terminate w/o reasons
> 
> If you use that mpirun option, mpirun will place the output from each rank 
> into a -separate- file for you. Give it:
> 
> mpirun --output-filename /myhome/debug/run01
> 
> and in /myhome/debug, you will find files:
> 
> run01.0
> run01.1
> ...
> 
> each with the output from the indicated rank.
> 
> 
> 
> On Mar 26, 2011, at 3:41 PM, Jack Bryan wrote:
> 
> The cluster can print out all output into one file. 
> 
> But, checking them for bugs is very hard. 
> 
> The cluster also print out possible error messages into one file. 
> 
> But, sometimes the error file is empty , sometimes it is signal 9.
> 
> If I only run dummy tasks on worker nodes, no errors. 
> 
> If I run real task, sometimes processes are terminated w/o any errors before 
> the program normally exit.
> Sometimes, the program get signal 9 but no other error messages. 
> 
> It is weird. 
> 
> Any help is really appreciated. 
> 
> Jack
> From: r...@open-mpi.org
> Date: Sat, 26 Mar 2011 15:18:53 -0600
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] OMPI error terminate w/o reasons
> 
> I don't know, but Ashley may be able to help - or you can see his web site 
> for instructions.
> 
> Alternatively, since you can put print statements into your code, have you 
> considered using mpirun's option to direct output from each rank into its own 
> file? Look at "mpirun -h" for the options.
> 
>    -output-filename|--output-filename <arg0>  
>                          Redirect output from application processes into
>                          filename.rank
> 
> 
> On Mar 26, 2011, at 2:48 PM, Jack Bryan wrote:
> 
> Is it possible to enable padb to print out the stack trace and other program 
> execute information into a file ?
> 
> I can run the program in gdb as this: 
> 
> mpirun -np 200 -e gdb ./myapplication 
> 
> How to make gdb print out the debug information to a file ? 
> So that I can check it when the program is terminated. 
> 
> thanks
> 
> Jack
> 
> From: r...@open-mpi.org
> Date: Sat, 26 Mar 2011 13:56:13 -0600
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] OMPI error terminate w/o reasons
> 
> You don't need to install anything on a system folder - you can just install 
> it in your home directory, assuming that is accessible on the remote nodes.
> 
> As for the script - unless you can somehow modify it to allow you to run 
> under a debugger, I am afraid you are completely out of luck.
> 
> 
> On Mar 26, 2011, at 12:54 PM, Jack Bryan wrote:
> 
> Hi, 
> 
> I am working on a cluster, where I am not allowed to install software on 
> system folder. 
> 
> My Open MPI is 1.3.4. 
> 
> I have a very quick of the padb on http://padb.pittman.org.uk/ . 
> 
> Does it require some software install on the cluster in order to use it ? 
> 
> I cannot use command-line to run job on the lcuster , but only script.
> 
> thanks
> 
> From: r...@open-mpi.org
> Date: Sat, 26 Mar 2011 12:12:11 -0600
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] OMPI error terminate w/o reasons
> 
> Have you tried a parallel debugger such as padb?
> 
> On Mar 26, 2011, at 10:34 AM, Jack Bryan wrote:
> 
> Hi, 
> 
> I have tried this. But, the printout from 200 parallel processes make it 
> very hard to locate the possible bug. 
> 
> They may not stop at the same point when the program got signal 9.
> 
> So, even though I can figure out the print out statements from all
> 200 processes, so many different locations where the processes
> are stopped make it harder to find out some hints about the bug. 
> 
> Are there some other programming tricks, which can help me 
> narrow down to the doubt points ASAP.
> Any help is appreciated. 
> 
> Jack
> 
> From: r...@open-mpi.org
> Date: Sat, 26 Mar 2011 07:53:40 -0600
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] OMPI error terminate w/o reasons
> 
> Try adding some print statements so you can see where the error occurs.
> 
> On Mar 25, 2011, at 11:49 PM, Jack Bryan wrote:
> 
> Hi , All: 
> 
> I running a Open MPI (1.3.4) program by 200 parallel processes. 
> 
> But, the program is terminated with 
> 
> --------------------------------------------------------------------------
> mpirun noticed that process rank 0 with PID 77967 on node n342 exited on 
> signal 9 (Killed).
> --------------------------------------------------------------------------
> 
> After searching, the signal 9 means: 
> 
> the process is currently in an unworkable state and should be terminated with 
> extreme prejudice
> 
>  If a process does not respond to any other termination signals, sending it a 
> SIGKILL signal will almost always cause it to go away.
> 
>  The system will generate SIGKILL for a process itself under some unusual 
> conditions where the program cannot possibly continue to run (even to run a 
> signal handler).
>  
> But, the error message does not indicate any possible reasons for the 
> termination. 
> 
> There is a FOR loop in the main() program, if the loop number is small (< 
> 200), the program works well, 
> but if it becomes lager and larger, the program will got SIGKILL. 
> 
> The cluster where I am running the MPI program does not allow running debug 
> tools. 
> 
> If I run it on a workstation, it will take a very very long time (for > 200 
> loops) in order to 
> get the error occur again. 
> 
> What can I do to find the possible bugs ? 
> 
> Any help is really appreciated. 
> 
> thanks
> 
> Jack
> 
> 
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________ users mailing list 
> us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________ users mailing list 
> us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________ users mailing list 
> us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________ users mailing list 
> us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________ users mailing list 
> us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to