Jeff, Gilles

Here's my scenario again when I tried something different:
I've interactively booked 2 nodes(cx1015 and cx1016) and working in
"cx1015" node.
Here I hit "module load openmpi" and "module load nwchem"( but I don't know
how to "module load" on other node).
Using the openmpi command to run: "<path>/mpirun --hostfile myhostfile -np
32 <path>/nwchem my_code.nw"

And AMAZINGLY it is working...

But can you guys suggest me a way so that I can make sure 2 of the booked
nodes are being used by mpirun not 1.

Thanks.

On Sun, Aug 2, 2015 at 5:16 PM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> The initial error was ompi could not find orted on the second node, and
> that was fixed by using the full path for mpirun
>
> if you run under pbs, you should not need the hostile option.
> just ask pbs to allocate 2 nodes and everything should run smoothly.
>
> at first, I recommend you run a non MPI application
> /.../bin/mpirun hostname
> and then nwchem
>
> if it still does not work, then run with verbose palm and post the output
>
> Cheets,
>
> Gilles
>
> On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com> wrote:
>
>> I'm on a HPC cluster. So, the openmpi-1.6.4 here installed as a module.
>> In .pbs script, before executing my code-line, I'm loading both "nwchem"
>> and "openmpi" module.
>> It is working very nicely when I work on single node(with 16 processors).
>> But if I try to switch in multiple nodes with "hostfile" option, things are
>>  starting to crash.
>>
>> On Sun, Aug 2, 2015 at 5:02 PM, abhisek Mondal <abhisek.m...@gmail.com>
>> wrote:
>>
>>> HI,
>>> I have tried using full paths for both of them. But stuck in the same
>>> issue.
>>>
>>> On Sun, Aug 2, 2015 at 4:39 PM, Gilles Gouaillardet <
>>> gilles.gouaillar...@gmail.com> wrote:
>>>
>>>> Is ompi installed on the other node and at the same location ?
>>>> did you configure ompi with --enable-mpirun-prefix-by-default ?
>>>> (note that should not be necessary if you invoke mpirun with full path )
>>>>
>>>> you can also try
>>>> /.../bin/mpirun --mca plm_base_verbose 100 ...
>>>>
>>>> and see if there is something wrong
>>>>
>>>> last but not least, can you try to use full path for both mpirun and
>>>> nwchem ?
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Gilles
>>>>
>>>> On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com>
>>>> wrote:
>>>>
>>>>> Yes, I have tried this and got following error:
>>>>>
>>>>> *mpirun was unable to launch the specified application as it could not
>>>>> find an executable:*
>>>>>
>>>>> *Executable: nwchem*
>>>>> *Node: cx934*
>>>>>
>>>>> *while attempting to start process rank 16.*
>>>>>
>>>>> Given that: I have to run my code with "nwchem filename.nw" command.
>>>>> While I run the same thing on 1 node with 16 processor, it works fine
>>>>> (mpirun -np 16 nwchem filename.nw).
>>>>> Can't understand why am I having problem while trying to go for
>>>>> multinode operation.
>>>>>
>>>>> Thanks.
>>>>>
>>>>> On Sun, Aug 2, 2015 at 3:41 PM, Gilles Gouaillardet <
>>>>> gilles.gouaillar...@gmail.com> wrote:
>>>>>
>>>>>> Can you try running invoking mpirun with its full path instead ?
>>>>>> e.g. /usr/local/bin/mpirun instead of mpirun
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Gilles
>>>>>>
>>>>>> On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Here is the other details,
>>>>>>>
>>>>>>> a. The Openmpi version is 1.6.4
>>>>>>>
>>>>>>> b. The error as being generated is :
>>>>>>> *Warning: Permanently added 'cx0937,10.1.4.1' (RSA) to the list of
>>>>>>> known hosts.*
>>>>>>> *Warning: Permanently added 'cx0934,10.1.3.255' (RSA) to the list of
>>>>>>> known hosts.*
>>>>>>> *orted: Command not found.*
>>>>>>> *orted: Command not found.*
>>>>>>>
>>>>>>> *--------------------------------------------------------------------------*
>>>>>>> *A daemon (pid 53580) died unexpectedly with status 1 while
>>>>>>> attempting*
>>>>>>> *to launch so we are aborting.*
>>>>>>>
>>>>>>> *There may be more information reported by the environment (see
>>>>>>> above).*
>>>>>>>
>>>>>>> *This may be because the daemon was unable to find all the needed
>>>>>>> shared*
>>>>>>> *libraries on the remote node. You may set your LD_LIBRARY_PATH to
>>>>>>> have the*
>>>>>>> *location of the shared libraries on the remote nodes and this will*
>>>>>>> *automatically be forwarded to the remote nodes.*
>>>>>>>
>>>>>>> *--------------------------------------------------------------------------*
>>>>>>>
>>>>>>> *--------------------------------------------------------------------------*
>>>>>>> *mpirun noticed that the job aborted, but has no info as to the
>>>>>>> process*
>>>>>>> *that caused that situation.*
>>>>>>>
>>>>>>> *--------------------------------------------------------------------------*
>>>>>>>
>>>>>>>
>>>>>>> I'm not being able to understand why "command not found" error is
>>>>>>> being raised.
>>>>>>> Thank you.
>>>>>>>
>>>>>>> On Sun, Aug 2, 2015 at 1:43 AM, Ralph Castain <r...@open-mpi.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Would you please tell us:
>>>>>>>>
>>>>>>>> (a) what version of OMPI you are using
>>>>>>>>
>>>>>>>> (b) what error message you are getting when the job terminates
>>>>>>>>
>>>>>>>>
>>>>>>>> On Aug 1, 2015, at 12:22 PM, abhisek Mondal <abhisek.m...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> I'm working on an openmpi enabled cluster. I'm trying to run a job
>>>>>>>> with 2 different nodes and 16 processors per nodes.
>>>>>>>> Using this command:
>>>>>>>>
>>>>>>>> *mpirun -np 32 --hostfile myhostfile -loadbalance exe*
>>>>>>>>
>>>>>>>> The contents of myhostfile:
>>>>>>>>
>>>>>>>> *cx0937 slots=16    *
>>>>>>>> *cx0934 slots=16*
>>>>>>>>
>>>>>>>>
>>>>>>>> But the job is getting terminated each time before job allocation
>>>>>>>> happens as per desired way.
>>>>>>>>
>>>>>>>> So, it'll very nice if I get some suggestions regarding the facts
>>>>>>>> I'm missing.
>>>>>>>>
>>>>>>>> Thank you
>>>>>>>>
>>>>>>>> --
>>>>>>>> Abhisek Mondal
>>>>>>>>
>>>>>>>> *Research Fellow*
>>>>>>>>
>>>>>>>> *Structural Biology and Bioinformatics*
>>>>>>>> *Indian Institute of Chemical Biology*
>>>>>>>>
>>>>>>>> *Kolkata 700032*
>>>>>>>>
>>>>>>>> *INDIA*
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> Searchable archives:
>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27367.php
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> Link to this post:
>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27367.php
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Abhisek Mondal
>>>>>>>
>>>>>>> *Research Fellow*
>>>>>>>
>>>>>>> *Structural Biology and Bioinformatics*
>>>>>>> *Indian Institute of Chemical Biology*
>>>>>>>
>>>>>>> *Kolkata 700032*
>>>>>>>
>>>>>>> *INDIA*
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post:
>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27369.php
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Abhisek Mondal
>>>>>
>>>>> *Research Fellow*
>>>>>
>>>>> *Structural Biology and Bioinformatics*
>>>>> *Indian Institute of Chemical Biology*
>>>>>
>>>>> *Kolkata 700032*
>>>>>
>>>>> *INDIA*
>>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2015/08/27371.php
>>>>
>>>
>>>
>>>
>>> --
>>> Abhisek Mondal
>>>
>>> *Research Fellow*
>>>
>>> *Structural Biology and Bioinformatics*
>>> *Indian Institute of Chemical Biology*
>>>
>>> *Kolkata 700032*
>>>
>>> *INDIA*
>>>
>>
>>
>>
>> --
>> Abhisek Mondal
>>
>> *Research Fellow*
>>
>> *Structural Biology and Bioinformatics*
>> *Indian Institute of Chemical Biology*
>>
>> *Kolkata 700032*
>>
>> *INDIA*
>>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/08/27375.php
>



-- 
Abhisek Mondal

*Research Fellow*

*Structural Biology and Bioinformatics*
*Indian Institute of Chemical Biology*

*Kolkata 700032*

*INDIA*

Reply via email to