simply replace nwchem with hostname both hosts should be part of the output...
Cheers, Gilles On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com> wrote: > Jeff, Gilles > > Here's my scenario again when I tried something different: > I've interactively booked 2 nodes(cx1015 and cx1016) and working in > "cx1015" node. > Here I hit "module load openmpi" and "module load nwchem"( but I don't > know how to "module load" on other node). > Using the openmpi command to run: "<path>/mpirun --hostfile myhostfile > -np 32 <path>/nwchem my_code.nw" > > And AMAZINGLY it is working... > > But can you guys suggest me a way so that I can make sure 2 of the booked > nodes are being used by mpirun not 1. > > Thanks. > > On Sun, Aug 2, 2015 at 5:16 PM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com > <javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>> wrote: > >> The initial error was ompi could not find orted on the second node, and >> that was fixed by using the full path for mpirun >> >> if you run under pbs, you should not need the hostile option. >> just ask pbs to allocate 2 nodes and everything should run smoothly. >> >> at first, I recommend you run a non MPI application >> /.../bin/mpirun hostname >> and then nwchem >> >> if it still does not work, then run with verbose palm and post the output >> >> Cheets, >> >> Gilles >> >> On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com >> <javascript:_e(%7B%7D,'cvml','abhisek.m...@gmail.com');>> wrote: >> >>> I'm on a HPC cluster. So, the openmpi-1.6.4 here installed as a module. >>> In .pbs script, before executing my code-line, I'm loading both "nwchem" >>> and "openmpi" module. >>> It is working very nicely when I work on single node(with 16 >>> processors). But if I try to switch in multiple nodes with "hostfile" >>> option, things are starting to crash. >>> >>> On Sun, Aug 2, 2015 at 5:02 PM, abhisek Mondal <abhisek.m...@gmail.com> >>> wrote: >>> >>>> HI, >>>> I have tried using full paths for both of them. But stuck in the same >>>> issue. >>>> >>>> On Sun, Aug 2, 2015 at 4:39 PM, Gilles Gouaillardet < >>>> gilles.gouaillar...@gmail.com> wrote: >>>> >>>>> Is ompi installed on the other node and at the same location ? >>>>> did you configure ompi with --enable-mpirun-prefix-by-default ? >>>>> (note that should not be necessary if you invoke mpirun with full path >>>>> ) >>>>> >>>>> you can also try >>>>> /.../bin/mpirun --mca plm_base_verbose 100 ... >>>>> >>>>> and see if there is something wrong >>>>> >>>>> last but not least, can you try to use full path for both mpirun and >>>>> nwchem ? >>>>> >>>>> >>>>> Cheers, >>>>> >>>>> Gilles >>>>> >>>>> On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com> >>>>> wrote: >>>>> >>>>>> Yes, I have tried this and got following error: >>>>>> >>>>>> *mpirun was unable to launch the specified application as it could >>>>>> not find an executable:* >>>>>> >>>>>> *Executable: nwchem* >>>>>> *Node: cx934* >>>>>> >>>>>> *while attempting to start process rank 16.* >>>>>> >>>>>> Given that: I have to run my code with "nwchem filename.nw" command. >>>>>> While I run the same thing on 1 node with 16 processor, it works fine >>>>>> (mpirun -np 16 nwchem filename.nw). >>>>>> Can't understand why am I having problem while trying to go for >>>>>> multinode operation. >>>>>> >>>>>> Thanks. >>>>>> >>>>>> On Sun, Aug 2, 2015 at 3:41 PM, Gilles Gouaillardet < >>>>>> gilles.gouaillar...@gmail.com> wrote: >>>>>> >>>>>>> Can you try running invoking mpirun with its full path instead ? >>>>>>> e.g. /usr/local/bin/mpirun instead of mpirun >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Gilles >>>>>>> >>>>>>> On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Here is the other details, >>>>>>>> >>>>>>>> a. The Openmpi version is 1.6.4 >>>>>>>> >>>>>>>> b. The error as being generated is : >>>>>>>> *Warning: Permanently added 'cx0937,10.1.4.1' (RSA) to the list of >>>>>>>> known hosts.* >>>>>>>> *Warning: Permanently added 'cx0934,10.1.3.255' (RSA) to the list >>>>>>>> of known hosts.* >>>>>>>> *orted: Command not found.* >>>>>>>> *orted: Command not found.* >>>>>>>> >>>>>>>> *--------------------------------------------------------------------------* >>>>>>>> *A daemon (pid 53580) died unexpectedly with status 1 while >>>>>>>> attempting* >>>>>>>> *to launch so we are aborting.* >>>>>>>> >>>>>>>> *There may be more information reported by the environment (see >>>>>>>> above).* >>>>>>>> >>>>>>>> *This may be because the daemon was unable to find all the needed >>>>>>>> shared* >>>>>>>> *libraries on the remote node. You may set your LD_LIBRARY_PATH to >>>>>>>> have the* >>>>>>>> *location of the shared libraries on the remote nodes and this will* >>>>>>>> *automatically be forwarded to the remote nodes.* >>>>>>>> >>>>>>>> *--------------------------------------------------------------------------* >>>>>>>> >>>>>>>> *--------------------------------------------------------------------------* >>>>>>>> *mpirun noticed that the job aborted, but has no info as to the >>>>>>>> process* >>>>>>>> *that caused that situation.* >>>>>>>> >>>>>>>> *--------------------------------------------------------------------------* >>>>>>>> >>>>>>>> >>>>>>>> I'm not being able to understand why "command not found" error is >>>>>>>> being raised. >>>>>>>> Thank you. >>>>>>>> >>>>>>>> On Sun, Aug 2, 2015 at 1:43 AM, Ralph Castain <r...@open-mpi.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Would you please tell us: >>>>>>>>> >>>>>>>>> (a) what version of OMPI you are using >>>>>>>>> >>>>>>>>> (b) what error message you are getting when the job terminates >>>>>>>>> >>>>>>>>> >>>>>>>>> On Aug 1, 2015, at 12:22 PM, abhisek Mondal < >>>>>>>>> abhisek.m...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> I'm working on an openmpi enabled cluster. I'm trying to run a job >>>>>>>>> with 2 different nodes and 16 processors per nodes. >>>>>>>>> Using this command: >>>>>>>>> >>>>>>>>> *mpirun -np 32 --hostfile myhostfile -loadbalance exe* >>>>>>>>> >>>>>>>>> The contents of myhostfile: >>>>>>>>> >>>>>>>>> *cx0937 slots=16 * >>>>>>>>> *cx0934 slots=16* >>>>>>>>> >>>>>>>>> >>>>>>>>> But the job is getting terminated each time before job allocation >>>>>>>>> happens as per desired way. >>>>>>>>> >>>>>>>>> So, it'll very nice if I get some suggestions regarding the facts >>>>>>>>> I'm missing. >>>>>>>>> >>>>>>>>> Thank you >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Abhisek Mondal >>>>>>>>> >>>>>>>>> *Research Fellow* >>>>>>>>> >>>>>>>>> *Structural Biology and Bioinformatics* >>>>>>>>> *Indian Institute of Chemical Biology* >>>>>>>>> >>>>>>>>> *Kolkata 700032* >>>>>>>>> >>>>>>>>> *INDIA* >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> Searchable archives: >>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27367.php >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> Link to this post: >>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27367.php >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Abhisek Mondal >>>>>>>> >>>>>>>> *Research Fellow* >>>>>>>> >>>>>>>> *Structural Biology and Bioinformatics* >>>>>>>> *Indian Institute of Chemical Biology* >>>>>>>> >>>>>>>> *Kolkata 700032* >>>>>>>> >>>>>>>> *INDIA* >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27369.php >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Abhisek Mondal >>>>>> >>>>>> *Research Fellow* >>>>>> >>>>>> *Structural Biology and Bioinformatics* >>>>>> *Indian Institute of Chemical Biology* >>>>>> >>>>>> *Kolkata 700032* >>>>>> >>>>>> *INDIA* >>>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2015/08/27371.php >>>>> >>>> >>>> >>>> >>>> -- >>>> Abhisek Mondal >>>> >>>> *Research Fellow* >>>> >>>> *Structural Biology and Bioinformatics* >>>> *Indian Institute of Chemical Biology* >>>> >>>> *Kolkata 700032* >>>> >>>> *INDIA* >>>> >>> >>> >>> >>> -- >>> Abhisek Mondal >>> >>> *Research Fellow* >>> >>> *Structural Biology and Bioinformatics* >>> *Indian Institute of Chemical Biology* >>> >>> *Kolkata 700032* >>> >>> *INDIA* >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/08/27375.php >> > > > > -- > Abhisek Mondal > > *Research Fellow* > > *Structural Biology and Bioinformatics* > *Indian Institute of Chemical Biology* > > *Kolkata 700032* > > *INDIA* >