I'm on a HPC cluster. So, the openmpi-1.6.4 here installed as a module.
In .pbs script, before executing my code-line, I'm loading both "nwchem"
and "openmpi" module.
It is working very nicely when I work on single node(with 16 processors).
But if I try to switch in multiple nodes with "hostfile" option, things are
 starting to crash.

On Sun, Aug 2, 2015 at 5:02 PM, abhisek Mondal <abhisek.m...@gmail.com>
wrote:

> HI,
> I have tried using full paths for both of them. But stuck in the same
> issue.
>
> On Sun, Aug 2, 2015 at 4:39 PM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
>
>> Is ompi installed on the other node and at the same location ?
>> did you configure ompi with --enable-mpirun-prefix-by-default ?
>> (note that should not be necessary if you invoke mpirun with full path )
>>
>> you can also try
>> /.../bin/mpirun --mca plm_base_verbose 100 ...
>>
>> and see if there is something wrong
>>
>> last but not least, can you try to use full path for both mpirun and
>> nwchem ?
>>
>>
>> Cheers,
>>
>> Gilles
>>
>> On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com> wrote:
>>
>>> Yes, I have tried this and got following error:
>>>
>>> *mpirun was unable to launch the specified application as it could not
>>> find an executable:*
>>>
>>> *Executable: nwchem*
>>> *Node: cx934*
>>>
>>> *while attempting to start process rank 16.*
>>>
>>> Given that: I have to run my code with "nwchem filename.nw" command.
>>> While I run the same thing on 1 node with 16 processor, it works fine
>>> (mpirun -np 16 nwchem filename.nw).
>>> Can't understand why am I having problem while trying to go for
>>> multinode operation.
>>>
>>> Thanks.
>>>
>>> On Sun, Aug 2, 2015 at 3:41 PM, Gilles Gouaillardet <
>>> gilles.gouaillar...@gmail.com> wrote:
>>>
>>>> Can you try running invoking mpirun with its full path instead ?
>>>> e.g. /usr/local/bin/mpirun instead of mpirun
>>>>
>>>> Cheers,
>>>>
>>>> Gilles
>>>>
>>>> On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com>
>>>> wrote:
>>>>
>>>>> Here is the other details,
>>>>>
>>>>> a. The Openmpi version is 1.6.4
>>>>>
>>>>> b. The error as being generated is :
>>>>> *Warning: Permanently added 'cx0937,10.1.4.1' (RSA) to the list of
>>>>> known hosts.*
>>>>> *Warning: Permanently added 'cx0934,10.1.3.255' (RSA) to the list of
>>>>> known hosts.*
>>>>> *orted: Command not found.*
>>>>> *orted: Command not found.*
>>>>>
>>>>> *--------------------------------------------------------------------------*
>>>>> *A daemon (pid 53580) died unexpectedly with status 1 while attempting*
>>>>> *to launch so we are aborting.*
>>>>>
>>>>> *There may be more information reported by the environment (see
>>>>> above).*
>>>>>
>>>>> *This may be because the daemon was unable to find all the needed
>>>>> shared*
>>>>> *libraries on the remote node. You may set your LD_LIBRARY_PATH to
>>>>> have the*
>>>>> *location of the shared libraries on the remote nodes and this will*
>>>>> *automatically be forwarded to the remote nodes.*
>>>>>
>>>>> *--------------------------------------------------------------------------*
>>>>>
>>>>> *--------------------------------------------------------------------------*
>>>>> *mpirun noticed that the job aborted, but has no info as to the
>>>>> process*
>>>>> *that caused that situation.*
>>>>>
>>>>> *--------------------------------------------------------------------------*
>>>>>
>>>>>
>>>>> I'm not being able to understand why "command not found" error is
>>>>> being raised.
>>>>> Thank you.
>>>>>
>>>>> On Sun, Aug 2, 2015 at 1:43 AM, Ralph Castain <r...@open-mpi.org>
>>>>> wrote:
>>>>>
>>>>>> Would you please tell us:
>>>>>>
>>>>>> (a) what version of OMPI you are using
>>>>>>
>>>>>> (b) what error message you are getting when the job terminates
>>>>>>
>>>>>>
>>>>>> On Aug 1, 2015, at 12:22 PM, abhisek Mondal <abhisek.m...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> I'm working on an openmpi enabled cluster. I'm trying to run a job
>>>>>> with 2 different nodes and 16 processors per nodes.
>>>>>> Using this command:
>>>>>>
>>>>>> *mpirun -np 32 --hostfile myhostfile -loadbalance exe*
>>>>>>
>>>>>> The contents of myhostfile:
>>>>>>
>>>>>> *cx0937 slots=16    *
>>>>>> *cx0934 slots=16*
>>>>>>
>>>>>>
>>>>>> But the job is getting terminated each time before job allocation
>>>>>> happens as per desired way.
>>>>>>
>>>>>> So, it'll very nice if I get some suggestions regarding the facts I'm
>>>>>> missing.
>>>>>>
>>>>>> Thank you
>>>>>>
>>>>>> --
>>>>>> Abhisek Mondal
>>>>>>
>>>>>> *Research Fellow*
>>>>>>
>>>>>> *Structural Biology and Bioinformatics*
>>>>>> *Indian Institute of Chemical Biology*
>>>>>>
>>>>>> *Kolkata 700032*
>>>>>>
>>>>>> *INDIA*
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Searchable archives:
>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27367.php
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post:
>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27367.php
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Abhisek Mondal
>>>>>
>>>>> *Research Fellow*
>>>>>
>>>>> *Structural Biology and Bioinformatics*
>>>>> *Indian Institute of Chemical Biology*
>>>>>
>>>>> *Kolkata 700032*
>>>>>
>>>>> *INDIA*
>>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2015/08/27369.php
>>>>
>>>
>>>
>>>
>>> --
>>> Abhisek Mondal
>>>
>>> *Research Fellow*
>>>
>>> *Structural Biology and Bioinformatics*
>>> *Indian Institute of Chemical Biology*
>>>
>>> *Kolkata 700032*
>>>
>>> *INDIA*
>>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/08/27371.php
>>
>
>
>
> --
> Abhisek Mondal
>
> *Research Fellow*
>
> *Structural Biology and Bioinformatics*
> *Indian Institute of Chemical Biology*
>
> *Kolkata 700032*
>
> *INDIA*
>



-- 
Abhisek Mondal

*Research Fellow*

*Structural Biology and Bioinformatics*
*Indian Institute of Chemical Biology*

*Kolkata 700032*

*INDIA*

Reply via email to