Abhisek -- You are having two problems:
1. In the first "orted not found" problem, Open MPI was not finding its "orted" helper executable on the remote nodes in your cluster. When you "module load ..." something, it just loads the relevant PATH / LD_LIBRARY_PATH / etc. on the local node; it doesn't do anything for the remote nodes on which you execute. Gilles suggested a good workaround: use the full path name to Open MPI's mpirun. This tells Open MPI "hey, please be sure to load up the PATH / LD_LIBRARY_PATH with the Right Stuff to find Open MPI's executables on the remote nodes." 2. The second problem is that Open MPI is not finding the nwchem executable on at least the specified node in your cluster (i.e., cx934). Perhaps something is wrong with the network filesystem on cx934...? You might want to login to that node interactively and check. > On Aug 2, 2015, at 7:32 AM, abhisek Mondal <abhisek.m...@gmail.com> wrote: > > HI, > I have tried using full paths for both of them. But stuck in the same issue. > > On Sun, Aug 2, 2015 at 4:39 PM, Gilles Gouaillardet > <gilles.gouaillar...@gmail.com> wrote: > Is ompi installed on the other node and at the same location ? > did you configure ompi with --enable-mpirun-prefix-by-default ? > (note that should not be necessary if you invoke mpirun with full path ) > > you can also try > /.../bin/mpirun --mca plm_base_verbose 100 ... > > and see if there is something wrong > > last but not least, can you try to use full path for both mpirun and nwchem ? > > > Cheers, > > Gilles > > On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com> wrote: > Yes, I have tried this and got following error: > > mpirun was unable to launch the specified application as it could not find an > executable: > > Executable: nwchem > Node: cx934 > > while attempting to start process rank 16. > > Given that: I have to run my code with "nwchem filename.nw" command. > While I run the same thing on 1 node with 16 processor, it works fine (mpirun > -np 16 nwchem filename.nw). > Can't understand why am I having problem while trying to go for multinode > operation. > > Thanks. > > On Sun, Aug 2, 2015 at 3:41 PM, Gilles Gouaillardet > <gilles.gouaillar...@gmail.com> wrote: > Can you try running invoking mpirun with its full path instead ? > e.g. /usr/local/bin/mpirun instead of mpirun > > Cheers, > > Gilles > > On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com> wrote: > Here is the other details, > > a. The Openmpi version is 1.6.4 > > b. The error as being generated is : > Warning: Permanently added 'cx0937,10.1.4.1' (RSA) to the list of known hosts. > Warning: Permanently added 'cx0934,10.1.3.255' (RSA) to the list of known > hosts. > orted: Command not found. > orted: Command not found. > -------------------------------------------------------------------------- > A daemon (pid 53580) died unexpectedly with status 1 while attempting > to launch so we are aborting. > > There may be more information reported by the environment (see above). > > This may be because the daemon was unable to find all the needed shared > libraries on the remote node. You may set your LD_LIBRARY_PATH to have the > location of the shared libraries on the remote nodes and this will > automatically be forwarded to the remote nodes. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun noticed that the job aborted, but has no info as to the process > that caused that situation. > -------------------------------------------------------------------------- > > > I'm not being able to understand why "command not found" error is being > raised. > Thank you. > > On Sun, Aug 2, 2015 at 1:43 AM, Ralph Castain <r...@open-mpi.org> wrote: > Would you please tell us: > > (a) what version of OMPI you are using > > (b) what error message you are getting when the job terminates > > >> On Aug 1, 2015, at 12:22 PM, abhisek Mondal <abhisek.m...@gmail.com> wrote: >> >> I'm working on an openmpi enabled cluster. I'm trying to run a job with 2 >> different nodes and 16 processors per nodes. >> Using this command: >> >> mpirun -np 32 --hostfile myhostfile -loadbalance exe >> >> The contents of myhostfile: >> >> cx0937 slots=16 >> cx0934 slots=16 >> >> >> But the job is getting terminated each time before job allocation happens as >> per desired way. >> >> So, it'll very nice if I get some suggestions regarding the facts I'm >> missing. >> >> Thank you >> >> -- >> Abhisek Mondal >> Research Fellow >> Structural Biology and Bioinformatics >> Indian Institute of Chemical Biology >> Kolkata 700032 >> INDIA >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Searchable archives: >> http://www.open-mpi.org/community/lists/users/2015/08/27367.php > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/08/27367.php > > > > -- > Abhisek Mondal > Research Fellow > Structural Biology and Bioinformatics > Indian Institute of Chemical Biology > Kolkata 700032 > INDIA > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/08/27369.php > > > > -- > Abhisek Mondal > Research Fellow > Structural Biology and Bioinformatics > Indian Institute of Chemical Biology > Kolkata 700032 > INDIA > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/08/27371.php > > > > -- > Abhisek Mondal > Research Fellow > Structural Biology and Bioinformatics > Indian Institute of Chemical Biology > Kolkata 700032 > INDIA > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/08/27372.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/