Jeff, Gilles Here's my scenario again when I tried something different: I've interactively booked 2 nodes(cx1015 and cx1016) and working in "cx1015" node. Here I hit "module load openmpi" and "module load nwchem"( but I don't know how to "module load" on other node). Using the openmpi command to run: "<path>/mpirun --hostfile myhostfile -np 32 <path>/nwchem my_code.nw"
And AMAZINGLY it is working... But can you guys suggest me a way so that I can make sure 2 of the booked nodes are being used by mpirun not 1. Thanks. On Sun, Aug 2, 2015 at 5:16 PM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > The initial error was ompi could not find orted on the second node, and > that was fixed by using the full path for mpirun > > if you run under pbs, you should not need the hostile option. > just ask pbs to allocate 2 nodes and everything should run smoothly. > > at first, I recommend you run a non MPI application > /.../bin/mpirun hostname > and then nwchem > > if it still does not work, then run with verbose palm and post the output > > Cheets, > > Gilles > > On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com> wrote: > >> I'm on a HPC cluster. So, the openmpi-1.6.4 here installed as a module. >> In .pbs script, before executing my code-line, I'm loading both "nwchem" >> and "openmpi" module. >> It is working very nicely when I work on single node(with 16 processors). >> But if I try to switch in multiple nodes with "hostfile" option, things are >> starting to crash. >> >> On Sun, Aug 2, 2015 at 5:02 PM, abhisek Mondal <abhisek.m...@gmail.com> >> wrote: >> >>> HI, >>> I have tried using full paths for both of them. But stuck in the same >>> issue. >>> >>> On Sun, Aug 2, 2015 at 4:39 PM, Gilles Gouaillardet < >>> gilles.gouaillar...@gmail.com> wrote: >>> >>>> Is ompi installed on the other node and at the same location ? >>>> did you configure ompi with --enable-mpirun-prefix-by-default ? >>>> (note that should not be necessary if you invoke mpirun with full path ) >>>> >>>> you can also try >>>> /.../bin/mpirun --mca plm_base_verbose 100 ... >>>> >>>> and see if there is something wrong >>>> >>>> last but not least, can you try to use full path for both mpirun and >>>> nwchem ? >>>> >>>> >>>> Cheers, >>>> >>>> Gilles >>>> >>>> On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com> >>>> wrote: >>>> >>>>> Yes, I have tried this and got following error: >>>>> >>>>> *mpirun was unable to launch the specified application as it could not >>>>> find an executable:* >>>>> >>>>> *Executable: nwchem* >>>>> *Node: cx934* >>>>> >>>>> *while attempting to start process rank 16.* >>>>> >>>>> Given that: I have to run my code with "nwchem filename.nw" command. >>>>> While I run the same thing on 1 node with 16 processor, it works fine >>>>> (mpirun -np 16 nwchem filename.nw). >>>>> Can't understand why am I having problem while trying to go for >>>>> multinode operation. >>>>> >>>>> Thanks. >>>>> >>>>> On Sun, Aug 2, 2015 at 3:41 PM, Gilles Gouaillardet < >>>>> gilles.gouaillar...@gmail.com> wrote: >>>>> >>>>>> Can you try running invoking mpirun with its full path instead ? >>>>>> e.g. /usr/local/bin/mpirun instead of mpirun >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Gilles >>>>>> >>>>>> On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Here is the other details, >>>>>>> >>>>>>> a. The Openmpi version is 1.6.4 >>>>>>> >>>>>>> b. The error as being generated is : >>>>>>> *Warning: Permanently added 'cx0937,10.1.4.1' (RSA) to the list of >>>>>>> known hosts.* >>>>>>> *Warning: Permanently added 'cx0934,10.1.3.255' (RSA) to the list of >>>>>>> known hosts.* >>>>>>> *orted: Command not found.* >>>>>>> *orted: Command not found.* >>>>>>> >>>>>>> *--------------------------------------------------------------------------* >>>>>>> *A daemon (pid 53580) died unexpectedly with status 1 while >>>>>>> attempting* >>>>>>> *to launch so we are aborting.* >>>>>>> >>>>>>> *There may be more information reported by the environment (see >>>>>>> above).* >>>>>>> >>>>>>> *This may be because the daemon was unable to find all the needed >>>>>>> shared* >>>>>>> *libraries on the remote node. You may set your LD_LIBRARY_PATH to >>>>>>> have the* >>>>>>> *location of the shared libraries on the remote nodes and this will* >>>>>>> *automatically be forwarded to the remote nodes.* >>>>>>> >>>>>>> *--------------------------------------------------------------------------* >>>>>>> >>>>>>> *--------------------------------------------------------------------------* >>>>>>> *mpirun noticed that the job aborted, but has no info as to the >>>>>>> process* >>>>>>> *that caused that situation.* >>>>>>> >>>>>>> *--------------------------------------------------------------------------* >>>>>>> >>>>>>> >>>>>>> I'm not being able to understand why "command not found" error is >>>>>>> being raised. >>>>>>> Thank you. >>>>>>> >>>>>>> On Sun, Aug 2, 2015 at 1:43 AM, Ralph Castain <r...@open-mpi.org> >>>>>>> wrote: >>>>>>> >>>>>>>> Would you please tell us: >>>>>>>> >>>>>>>> (a) what version of OMPI you are using >>>>>>>> >>>>>>>> (b) what error message you are getting when the job terminates >>>>>>>> >>>>>>>> >>>>>>>> On Aug 1, 2015, at 12:22 PM, abhisek Mondal <abhisek.m...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> I'm working on an openmpi enabled cluster. I'm trying to run a job >>>>>>>> with 2 different nodes and 16 processors per nodes. >>>>>>>> Using this command: >>>>>>>> >>>>>>>> *mpirun -np 32 --hostfile myhostfile -loadbalance exe* >>>>>>>> >>>>>>>> The contents of myhostfile: >>>>>>>> >>>>>>>> *cx0937 slots=16 * >>>>>>>> *cx0934 slots=16* >>>>>>>> >>>>>>>> >>>>>>>> But the job is getting terminated each time before job allocation >>>>>>>> happens as per desired way. >>>>>>>> >>>>>>>> So, it'll very nice if I get some suggestions regarding the facts >>>>>>>> I'm missing. >>>>>>>> >>>>>>>> Thank you >>>>>>>> >>>>>>>> -- >>>>>>>> Abhisek Mondal >>>>>>>> >>>>>>>> *Research Fellow* >>>>>>>> >>>>>>>> *Structural Biology and Bioinformatics* >>>>>>>> *Indian Institute of Chemical Biology* >>>>>>>> >>>>>>>> *Kolkata 700032* >>>>>>>> >>>>>>>> *INDIA* >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> Searchable archives: >>>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27367.php >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> Link to this post: >>>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27367.php >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Abhisek Mondal >>>>>>> >>>>>>> *Research Fellow* >>>>>>> >>>>>>> *Structural Biology and Bioinformatics* >>>>>>> *Indian Institute of Chemical Biology* >>>>>>> >>>>>>> *Kolkata 700032* >>>>>>> >>>>>>> *INDIA* >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27369.php >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Abhisek Mondal >>>>> >>>>> *Research Fellow* >>>>> >>>>> *Structural Biology and Bioinformatics* >>>>> *Indian Institute of Chemical Biology* >>>>> >>>>> *Kolkata 700032* >>>>> >>>>> *INDIA* >>>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/08/27371.php >>>> >>> >>> >>> >>> -- >>> Abhisek Mondal >>> >>> *Research Fellow* >>> >>> *Structural Biology and Bioinformatics* >>> *Indian Institute of Chemical Biology* >>> >>> *Kolkata 700032* >>> >>> *INDIA* >>> >> >> >> >> -- >> Abhisek Mondal >> >> *Research Fellow* >> >> *Structural Biology and Bioinformatics* >> *Indian Institute of Chemical Biology* >> >> *Kolkata 700032* >> >> *INDIA* >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/08/27375.php > -- Abhisek Mondal *Research Fellow* *Structural Biology and Bioinformatics* *Indian Institute of Chemical Biology* *Kolkata 700032* *INDIA*