simply replace nwchem with hostname

both hosts should be part of the output...

Cheers,

Gilles

On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com> wrote:

> Jeff, Gilles
>
> Here's my scenario again when I tried something different:
> I've interactively booked 2 nodes(cx1015 and cx1016) and working in
> "cx1015" node.
> Here I hit "module load openmpi" and "module load nwchem"( but I don't
> know how to "module load" on other node).
> Using the openmpi command to run: "<path>/mpirun --hostfile myhostfile
> -np 32 <path>/nwchem my_code.nw"
>
> And AMAZINGLY it is working...
>
> But can you guys suggest me a way so that I can make sure 2 of the booked
> nodes are being used by mpirun not 1.
>
> Thanks.
>
> On Sun, Aug 2, 2015 at 5:16 PM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com
> <javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>> wrote:
>
>> The initial error was ompi could not find orted on the second node, and
>> that was fixed by using the full path for mpirun
>>
>> if you run under pbs, you should not need the hostile option.
>> just ask pbs to allocate 2 nodes and everything should run smoothly.
>>
>> at first, I recommend you run a non MPI application
>> /.../bin/mpirun hostname
>> and then nwchem
>>
>> if it still does not work, then run with verbose palm and post the output
>>
>> Cheets,
>>
>> Gilles
>>
>> On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','abhisek.m...@gmail.com');>> wrote:
>>
>>> I'm on a HPC cluster. So, the openmpi-1.6.4 here installed as a module.
>>> In .pbs script, before executing my code-line, I'm loading both "nwchem"
>>> and "openmpi" module.
>>> It is working very nicely when I work on single node(with 16
>>> processors). But if I try to switch in multiple nodes with "hostfile"
>>> option, things are  starting to crash.
>>>
>>> On Sun, Aug 2, 2015 at 5:02 PM, abhisek Mondal <abhisek.m...@gmail.com>
>>> wrote:
>>>
>>>> HI,
>>>> I have tried using full paths for both of them. But stuck in the same
>>>> issue.
>>>>
>>>> On Sun, Aug 2, 2015 at 4:39 PM, Gilles Gouaillardet <
>>>> gilles.gouaillar...@gmail.com> wrote:
>>>>
>>>>> Is ompi installed on the other node and at the same location ?
>>>>> did you configure ompi with --enable-mpirun-prefix-by-default ?
>>>>> (note that should not be necessary if you invoke mpirun with full path
>>>>> )
>>>>>
>>>>> you can also try
>>>>> /.../bin/mpirun --mca plm_base_verbose 100 ...
>>>>>
>>>>> and see if there is something wrong
>>>>>
>>>>> last but not least, can you try to use full path for both mpirun and
>>>>> nwchem ?
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Gilles
>>>>>
>>>>> On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Yes, I have tried this and got following error:
>>>>>>
>>>>>> *mpirun was unable to launch the specified application as it could
>>>>>> not find an executable:*
>>>>>>
>>>>>> *Executable: nwchem*
>>>>>> *Node: cx934*
>>>>>>
>>>>>> *while attempting to start process rank 16.*
>>>>>>
>>>>>> Given that: I have to run my code with "nwchem filename.nw" command.
>>>>>> While I run the same thing on 1 node with 16 processor, it works fine
>>>>>> (mpirun -np 16 nwchem filename.nw).
>>>>>> Can't understand why am I having problem while trying to go for
>>>>>> multinode operation.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> On Sun, Aug 2, 2015 at 3:41 PM, Gilles Gouaillardet <
>>>>>> gilles.gouaillar...@gmail.com> wrote:
>>>>>>
>>>>>>> Can you try running invoking mpirun with its full path instead ?
>>>>>>> e.g. /usr/local/bin/mpirun instead of mpirun
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Gilles
>>>>>>>
>>>>>>> On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Here is the other details,
>>>>>>>>
>>>>>>>> a. The Openmpi version is 1.6.4
>>>>>>>>
>>>>>>>> b. The error as being generated is :
>>>>>>>> *Warning: Permanently added 'cx0937,10.1.4.1' (RSA) to the list of
>>>>>>>> known hosts.*
>>>>>>>> *Warning: Permanently added 'cx0934,10.1.3.255' (RSA) to the list
>>>>>>>> of known hosts.*
>>>>>>>> *orted: Command not found.*
>>>>>>>> *orted: Command not found.*
>>>>>>>>
>>>>>>>> *--------------------------------------------------------------------------*
>>>>>>>> *A daemon (pid 53580) died unexpectedly with status 1 while
>>>>>>>> attempting*
>>>>>>>> *to launch so we are aborting.*
>>>>>>>>
>>>>>>>> *There may be more information reported by the environment (see
>>>>>>>> above).*
>>>>>>>>
>>>>>>>> *This may be because the daemon was unable to find all the needed
>>>>>>>> shared*
>>>>>>>> *libraries on the remote node. You may set your LD_LIBRARY_PATH to
>>>>>>>> have the*
>>>>>>>> *location of the shared libraries on the remote nodes and this will*
>>>>>>>> *automatically be forwarded to the remote nodes.*
>>>>>>>>
>>>>>>>> *--------------------------------------------------------------------------*
>>>>>>>>
>>>>>>>> *--------------------------------------------------------------------------*
>>>>>>>> *mpirun noticed that the job aborted, but has no info as to the
>>>>>>>> process*
>>>>>>>> *that caused that situation.*
>>>>>>>>
>>>>>>>> *--------------------------------------------------------------------------*
>>>>>>>>
>>>>>>>>
>>>>>>>> I'm not being able to understand why "command not found" error is
>>>>>>>> being raised.
>>>>>>>> Thank you.
>>>>>>>>
>>>>>>>> On Sun, Aug 2, 2015 at 1:43 AM, Ralph Castain <r...@open-mpi.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Would you please tell us:
>>>>>>>>>
>>>>>>>>> (a) what version of OMPI you are using
>>>>>>>>>
>>>>>>>>> (b) what error message you are getting when the job terminates
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Aug 1, 2015, at 12:22 PM, abhisek Mondal <
>>>>>>>>> abhisek.m...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> I'm working on an openmpi enabled cluster. I'm trying to run a job
>>>>>>>>> with 2 different nodes and 16 processors per nodes.
>>>>>>>>> Using this command:
>>>>>>>>>
>>>>>>>>> *mpirun -np 32 --hostfile myhostfile -loadbalance exe*
>>>>>>>>>
>>>>>>>>> The contents of myhostfile:
>>>>>>>>>
>>>>>>>>> *cx0937 slots=16    *
>>>>>>>>> *cx0934 slots=16*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> But the job is getting terminated each time before job allocation
>>>>>>>>> happens as per desired way.
>>>>>>>>>
>>>>>>>>> So, it'll very nice if I get some suggestions regarding the facts
>>>>>>>>> I'm missing.
>>>>>>>>>
>>>>>>>>> Thank you
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Abhisek Mondal
>>>>>>>>>
>>>>>>>>> *Research Fellow*
>>>>>>>>>
>>>>>>>>> *Structural Biology and Bioinformatics*
>>>>>>>>> *Indian Institute of Chemical Biology*
>>>>>>>>>
>>>>>>>>> *Kolkata 700032*
>>>>>>>>>
>>>>>>>>> *INDIA*
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> Searchable archives:
>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27367.php
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> Link to this post:
>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27367.php
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Abhisek Mondal
>>>>>>>>
>>>>>>>> *Research Fellow*
>>>>>>>>
>>>>>>>> *Structural Biology and Bioinformatics*
>>>>>>>> *Indian Institute of Chemical Biology*
>>>>>>>>
>>>>>>>> *Kolkata 700032*
>>>>>>>>
>>>>>>>> *INDIA*
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this post:
>>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27369.php
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Abhisek Mondal
>>>>>>
>>>>>> *Research Fellow*
>>>>>>
>>>>>> *Structural Biology and Bioinformatics*
>>>>>> *Indian Institute of Chemical Biology*
>>>>>>
>>>>>> *Kolkata 700032*
>>>>>>
>>>>>> *INDIA*
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27371.php
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Abhisek Mondal
>>>>
>>>> *Research Fellow*
>>>>
>>>> *Structural Biology and Bioinformatics*
>>>> *Indian Institute of Chemical Biology*
>>>>
>>>> *Kolkata 700032*
>>>>
>>>> *INDIA*
>>>>
>>>
>>>
>>>
>>> --
>>> Abhisek Mondal
>>>
>>> *Research Fellow*
>>>
>>> *Structural Biology and Bioinformatics*
>>> *Indian Institute of Chemical Biology*
>>>
>>> *Kolkata 700032*
>>>
>>> *INDIA*
>>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/08/27375.php
>>
>
>
>
> --
> Abhisek Mondal
>
> *Research Fellow*
>
> *Structural Biology and Bioinformatics*
> *Indian Institute of Chemical Biology*
>
> *Kolkata 700032*
>
> *INDIA*
>

Reply via email to