The initial error was ompi could not find orted on the second node, and that was fixed by using the full path for mpirun
if you run under pbs, you should not need the hostile option. just ask pbs to allocate 2 nodes and everything should run smoothly. at first, I recommend you run a non MPI application /.../bin/mpirun hostname and then nwchem if it still does not work, then run with verbose palm and post the output Cheets, Gilles On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com> wrote: > I'm on a HPC cluster. So, the openmpi-1.6.4 here installed as a module. > In .pbs script, before executing my code-line, I'm loading both "nwchem" > and "openmpi" module. > It is working very nicely when I work on single node(with 16 processors). > But if I try to switch in multiple nodes with "hostfile" option, things are > starting to crash. > > On Sun, Aug 2, 2015 at 5:02 PM, abhisek Mondal <abhisek.m...@gmail.com > <javascript:_e(%7B%7D,'cvml','abhisek.m...@gmail.com');>> wrote: > >> HI, >> I have tried using full paths for both of them. But stuck in the same >> issue. >> >> On Sun, Aug 2, 2015 at 4:39 PM, Gilles Gouaillardet < >> gilles.gouaillar...@gmail.com >> <javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>> wrote: >> >>> Is ompi installed on the other node and at the same location ? >>> did you configure ompi with --enable-mpirun-prefix-by-default ? >>> (note that should not be necessary if you invoke mpirun with full path ) >>> >>> you can also try >>> /.../bin/mpirun --mca plm_base_verbose 100 ... >>> >>> and see if there is something wrong >>> >>> last but not least, can you try to use full path for both mpirun and >>> nwchem ? >>> >>> >>> Cheers, >>> >>> Gilles >>> >>> On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com >>> <javascript:_e(%7B%7D,'cvml','abhisek.m...@gmail.com');>> wrote: >>> >>>> Yes, I have tried this and got following error: >>>> >>>> *mpirun was unable to launch the specified application as it could not >>>> find an executable:* >>>> >>>> *Executable: nwchem* >>>> *Node: cx934* >>>> >>>> *while attempting to start process rank 16.* >>>> >>>> Given that: I have to run my code with "nwchem filename.nw" command. >>>> While I run the same thing on 1 node with 16 processor, it works fine >>>> (mpirun -np 16 nwchem filename.nw). >>>> Can't understand why am I having problem while trying to go for >>>> multinode operation. >>>> >>>> Thanks. >>>> >>>> On Sun, Aug 2, 2015 at 3:41 PM, Gilles Gouaillardet < >>>> gilles.gouaillar...@gmail.com> wrote: >>>> >>>>> Can you try running invoking mpirun with its full path instead ? >>>>> e.g. /usr/local/bin/mpirun instead of mpirun >>>>> >>>>> Cheers, >>>>> >>>>> Gilles >>>>> >>>>> On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com> >>>>> wrote: >>>>> >>>>>> Here is the other details, >>>>>> >>>>>> a. The Openmpi version is 1.6.4 >>>>>> >>>>>> b. The error as being generated is : >>>>>> *Warning: Permanently added 'cx0937,10.1.4.1' (RSA) to the list of >>>>>> known hosts.* >>>>>> *Warning: Permanently added 'cx0934,10.1.3.255' (RSA) to the list of >>>>>> known hosts.* >>>>>> *orted: Command not found.* >>>>>> *orted: Command not found.* >>>>>> >>>>>> *--------------------------------------------------------------------------* >>>>>> *A daemon (pid 53580) died unexpectedly with status 1 while >>>>>> attempting* >>>>>> *to launch so we are aborting.* >>>>>> >>>>>> *There may be more information reported by the environment (see >>>>>> above).* >>>>>> >>>>>> *This may be because the daemon was unable to find all the needed >>>>>> shared* >>>>>> *libraries on the remote node. You may set your LD_LIBRARY_PATH to >>>>>> have the* >>>>>> *location of the shared libraries on the remote nodes and this will* >>>>>> *automatically be forwarded to the remote nodes.* >>>>>> >>>>>> *--------------------------------------------------------------------------* >>>>>> >>>>>> *--------------------------------------------------------------------------* >>>>>> *mpirun noticed that the job aborted, but has no info as to the >>>>>> process* >>>>>> *that caused that situation.* >>>>>> >>>>>> *--------------------------------------------------------------------------* >>>>>> >>>>>> >>>>>> I'm not being able to understand why "command not found" error is >>>>>> being raised. >>>>>> Thank you. >>>>>> >>>>>> On Sun, Aug 2, 2015 at 1:43 AM, Ralph Castain <r...@open-mpi.org> >>>>>> wrote: >>>>>> >>>>>>> Would you please tell us: >>>>>>> >>>>>>> (a) what version of OMPI you are using >>>>>>> >>>>>>> (b) what error message you are getting when the job terminates >>>>>>> >>>>>>> >>>>>>> On Aug 1, 2015, at 12:22 PM, abhisek Mondal <abhisek.m...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> I'm working on an openmpi enabled cluster. I'm trying to run a job >>>>>>> with 2 different nodes and 16 processors per nodes. >>>>>>> Using this command: >>>>>>> >>>>>>> *mpirun -np 32 --hostfile myhostfile -loadbalance exe* >>>>>>> >>>>>>> The contents of myhostfile: >>>>>>> >>>>>>> *cx0937 slots=16 * >>>>>>> *cx0934 slots=16* >>>>>>> >>>>>>> >>>>>>> But the job is getting terminated each time before job allocation >>>>>>> happens as per desired way. >>>>>>> >>>>>>> So, it'll very nice if I get some suggestions regarding the facts >>>>>>> I'm missing. >>>>>>> >>>>>>> Thank you >>>>>>> >>>>>>> -- >>>>>>> Abhisek Mondal >>>>>>> >>>>>>> *Research Fellow* >>>>>>> >>>>>>> *Structural Biology and Bioinformatics* >>>>>>> *Indian Institute of Chemical Biology* >>>>>>> >>>>>>> *Kolkata 700032* >>>>>>> >>>>>>> *INDIA* >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Searchable archives: >>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27367.php >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27367.php >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Abhisek Mondal >>>>>> >>>>>> *Research Fellow* >>>>>> >>>>>> *Structural Biology and Bioinformatics* >>>>>> *Indian Institute of Chemical Biology* >>>>>> >>>>>> *Kolkata 700032* >>>>>> >>>>>> *INDIA* >>>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2015/08/27369.php >>>>> >>>> >>>> >>>> >>>> -- >>>> Abhisek Mondal >>>> >>>> *Research Fellow* >>>> >>>> *Structural Biology and Bioinformatics* >>>> *Indian Institute of Chemical Biology* >>>> >>>> *Kolkata 700032* >>>> >>>> *INDIA* >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/08/27371.php >>> >> >> >> >> -- >> Abhisek Mondal >> >> *Research Fellow* >> >> *Structural Biology and Bioinformatics* >> *Indian Institute of Chemical Biology* >> >> *Kolkata 700032* >> >> *INDIA* >> > > > > -- > Abhisek Mondal > > *Research Fellow* > > *Structural Biology and Bioinformatics* > *Indian Institute of Chemical Biology* > > *Kolkata 700032* > > *INDIA* >