You don't need to install anything on a system folder - you can just install it in your home directory, assuming that is accessible on the remote nodes.
As for the script - unless you can somehow modify it to allow you to run under a debugger, I am afraid you are completely out of luck. On Mar 26, 2011, at 12:54 PM, Jack Bryan wrote: > Hi, > > I am working on a cluster, where I am not allowed to install software on > system folder. > > My Open MPI is 1.3.4. > > I have a very quick of the padb on http://padb.pittman.org.uk/ . > > Does it require some software install on the cluster in order to use it ? > > I cannot use command-line to run job on the lcuster , but only script. > > thanks > > From: r...@open-mpi.org > Date: Sat, 26 Mar 2011 12:12:11 -0600 > To: us...@open-mpi.org > Subject: Re: [OMPI users] OMPI error terminate w/o reasons > > Have you tried a parallel debugger such as padb? > > On Mar 26, 2011, at 10:34 AM, Jack Bryan wrote: > > Hi, > > I have tried this. But, the printout from 200 parallel processes make it > very hard to locate the possible bug. > > They may not stop at the same point when the program got signal 9. > > So, even though I can figure out the print out statements from all > 200 processes, so many different locations where the processes > are stopped make it harder to find out some hints about the bug. > > Are there some other programming tricks, which can help me > narrow down to the doubt points ASAP. > Any help is appreciated. > > Jack > > From: r...@open-mpi.org > Date: Sat, 26 Mar 2011 07:53:40 -0600 > To: us...@open-mpi.org > Subject: Re: [OMPI users] OMPI error terminate w/o reasons > > Try adding some print statements so you can see where the error occurs. > > On Mar 25, 2011, at 11:49 PM, Jack Bryan wrote: > > Hi , All: > > I running a Open MPI (1.3.4) program by 200 parallel processes. > > But, the program is terminated with > > -------------------------------------------------------------------------- > mpirun noticed that process rank 0 with PID 77967 on node n342 exited on > signal 9 (Killed). > -------------------------------------------------------------------------- > > After searching, the signal 9 means: > > the process is currently in an unworkable state and should be terminated with > extreme prejudice > > If a process does not respond to any other termination signals, sending it a > SIGKILL signal will almost always cause it to go away. > > The system will generate SIGKILL for a process itself under some unusual > conditions where the program cannot possibly continue to run (even to run a > signal handler). > > But, the error message does not indicate any possible reasons for the > termination. > > There is a FOR loop in the main() program, if the loop number is small (< > 200), the program works well, > but if it becomes lager and larger, the program will got SIGKILL. > > The cluster where I am running the MPI program does not allow running debug > tools. > > If I run it on a workstation, it will take a very very long time (for > 200 > loops) in order to > get the error occur again. > > What can I do to find the possible bugs ? > > Any help is really appreciated. > > thanks > > Jack > > > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ users mailing list > us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ users mailing list > us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users