I'm on a HPC cluster. So, the openmpi-1.6.4 here installed as a module. In .pbs script, before executing my code-line, I'm loading both "nwchem" and "openmpi" module. It is working very nicely when I work on single node(with 16 processors). But if I try to switch in multiple nodes with "hostfile" option, things are starting to crash.
On Sun, Aug 2, 2015 at 5:02 PM, abhisek Mondal <abhisek.m...@gmail.com> wrote: > HI, > I have tried using full paths for both of them. But stuck in the same > issue. > > On Sun, Aug 2, 2015 at 4:39 PM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com> wrote: > >> Is ompi installed on the other node and at the same location ? >> did you configure ompi with --enable-mpirun-prefix-by-default ? >> (note that should not be necessary if you invoke mpirun with full path ) >> >> you can also try >> /.../bin/mpirun --mca plm_base_verbose 100 ... >> >> and see if there is something wrong >> >> last but not least, can you try to use full path for both mpirun and >> nwchem ? >> >> >> Cheers, >> >> Gilles >> >> On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com> wrote: >> >>> Yes, I have tried this and got following error: >>> >>> *mpirun was unable to launch the specified application as it could not >>> find an executable:* >>> >>> *Executable: nwchem* >>> *Node: cx934* >>> >>> *while attempting to start process rank 16.* >>> >>> Given that: I have to run my code with "nwchem filename.nw" command. >>> While I run the same thing on 1 node with 16 processor, it works fine >>> (mpirun -np 16 nwchem filename.nw). >>> Can't understand why am I having problem while trying to go for >>> multinode operation. >>> >>> Thanks. >>> >>> On Sun, Aug 2, 2015 at 3:41 PM, Gilles Gouaillardet < >>> gilles.gouaillar...@gmail.com> wrote: >>> >>>> Can you try running invoking mpirun with its full path instead ? >>>> e.g. /usr/local/bin/mpirun instead of mpirun >>>> >>>> Cheers, >>>> >>>> Gilles >>>> >>>> On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com> >>>> wrote: >>>> >>>>> Here is the other details, >>>>> >>>>> a. The Openmpi version is 1.6.4 >>>>> >>>>> b. The error as being generated is : >>>>> *Warning: Permanently added 'cx0937,10.1.4.1' (RSA) to the list of >>>>> known hosts.* >>>>> *Warning: Permanently added 'cx0934,10.1.3.255' (RSA) to the list of >>>>> known hosts.* >>>>> *orted: Command not found.* >>>>> *orted: Command not found.* >>>>> >>>>> *--------------------------------------------------------------------------* >>>>> *A daemon (pid 53580) died unexpectedly with status 1 while attempting* >>>>> *to launch so we are aborting.* >>>>> >>>>> *There may be more information reported by the environment (see >>>>> above).* >>>>> >>>>> *This may be because the daemon was unable to find all the needed >>>>> shared* >>>>> *libraries on the remote node. You may set your LD_LIBRARY_PATH to >>>>> have the* >>>>> *location of the shared libraries on the remote nodes and this will* >>>>> *automatically be forwarded to the remote nodes.* >>>>> >>>>> *--------------------------------------------------------------------------* >>>>> >>>>> *--------------------------------------------------------------------------* >>>>> *mpirun noticed that the job aborted, but has no info as to the >>>>> process* >>>>> *that caused that situation.* >>>>> >>>>> *--------------------------------------------------------------------------* >>>>> >>>>> >>>>> I'm not being able to understand why "command not found" error is >>>>> being raised. >>>>> Thank you. >>>>> >>>>> On Sun, Aug 2, 2015 at 1:43 AM, Ralph Castain <r...@open-mpi.org> >>>>> wrote: >>>>> >>>>>> Would you please tell us: >>>>>> >>>>>> (a) what version of OMPI you are using >>>>>> >>>>>> (b) what error message you are getting when the job terminates >>>>>> >>>>>> >>>>>> On Aug 1, 2015, at 12:22 PM, abhisek Mondal <abhisek.m...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> I'm working on an openmpi enabled cluster. I'm trying to run a job >>>>>> with 2 different nodes and 16 processors per nodes. >>>>>> Using this command: >>>>>> >>>>>> *mpirun -np 32 --hostfile myhostfile -loadbalance exe* >>>>>> >>>>>> The contents of myhostfile: >>>>>> >>>>>> *cx0937 slots=16 * >>>>>> *cx0934 slots=16* >>>>>> >>>>>> >>>>>> But the job is getting terminated each time before job allocation >>>>>> happens as per desired way. >>>>>> >>>>>> So, it'll very nice if I get some suggestions regarding the facts I'm >>>>>> missing. >>>>>> >>>>>> Thank you >>>>>> >>>>>> -- >>>>>> Abhisek Mondal >>>>>> >>>>>> *Research Fellow* >>>>>> >>>>>> *Structural Biology and Bioinformatics* >>>>>> *Indian Institute of Chemical Biology* >>>>>> >>>>>> *Kolkata 700032* >>>>>> >>>>>> *INDIA* >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Searchable archives: >>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27367.php >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27367.php >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Abhisek Mondal >>>>> >>>>> *Research Fellow* >>>>> >>>>> *Structural Biology and Bioinformatics* >>>>> *Indian Institute of Chemical Biology* >>>>> >>>>> *Kolkata 700032* >>>>> >>>>> *INDIA* >>>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/08/27369.php >>>> >>> >>> >>> >>> -- >>> Abhisek Mondal >>> >>> *Research Fellow* >>> >>> *Structural Biology and Bioinformatics* >>> *Indian Institute of Chemical Biology* >>> >>> *Kolkata 700032* >>> >>> *INDIA* >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/08/27371.php >> > > > > -- > Abhisek Mondal > > *Research Fellow* > > *Structural Biology and Bioinformatics* > *Indian Institute of Chemical Biology* > > *Kolkata 700032* > > *INDIA* > -- Abhisek Mondal *Research Fellow* *Structural Biology and Bioinformatics* *Indian Institute of Chemical Biology* *Kolkata 700032* *INDIA*