Looks to me like you have an error in your cmd line - you aren't specifying the 
number of procs to run. My guess is that the system is hanging trying to 
resolve the process map as a result. Try adding "-np 1" to the cmd line.

The output indicates it is dropping slurm because it doesn't see a slurm 
allocation. So it is defaulting to use of rsh/ssh to launch.


On Mar 30, 2010, at 4:27 AM, uriz.49...@e.unavarra.es wrote:

> I've benn investigating and there is no firewall that could stop TCP
> traffic in the cluster. With the option --mca plm_base_verbose 30 I get
> the following output:
> 
> [itanium1] /home/otro > mpirun --mca plm_base_verbose 30 --host itanium2
> helloworld.out
> [itanium1:08311] mca: base: components_open: Looking for plm components
> [itanium1:08311] mca: base: components_open: opening plm components
> [itanium1:08311] mca: base: components_open: found loaded component rsh
> [itanium1:08311] mca: base: components_open: component rsh has no register
> function
> [itanium1:08311] mca: base: components_open: component rsh open function
> successful
> [itanium1:08311] mca: base: components_open: found loaded component slurm
> [itanium1:08311] mca: base: components_open: component slurm has no
> register function
> [itanium1:08311] mca: base: components_open: component slurm open function
> successful
> [itanium1:08311] mca:base:select: Auto-selecting plm components
> [itanium1:08311] mca:base:select:(  plm) Querying component [rsh]
> [itanium1:08311] mca:base:select:(  plm) Query of component [rsh] set
> priority to 10
> [itanium1:08311] mca:base:select:(  plm) Querying component [slurm]
> [itanium1:08311] mca:base:select:(  plm) Skipping component [slurm]. Query
> failed to return a module
> [itanium1:08311] mca:base:select:(  plm) Selected component [rsh]
> [itanium1:08311] mca: base: close: component slurm closed
> [itanium1:08311] mca: base: close: unloading component slurm
> 
> --Hangs here
> 
> It seems a slurm problem??
> 
> Thanks to any idea
> 
> El Vie, 19 de Marzo de 2010, 17:57, Ralph Castain escribió:
>> Did you configure OMPI with --enable-debug? You should do this so that
>> more diagnostic output is available.
>> 
>> You can also add the following to your cmd line to get more info:
>> 
>> --debug --debug-daemons --leave-session-attached
>> 
>> Something is likely blocking proper launch of the daemons and processes so
>> you aren't getting to the btl's at all.
>> 
>> 
>> On Mar 19, 2010, at 9:42 AM, uriz.49...@e.unavarra.es wrote:
>> 
>>> The processes are running on the remote nodes but they don't give the
>>> response to the origin node. I don't know why.
>>> With the option --mca btl_base_verbose 30, I have the same problems and
>>> it
>>> doesn't show any message.
>>> 
>>> Thanks
>>> 
>>>> On Wed, Mar 17, 2010 at 1:41 PM, Jeff Squyres <jsquy...@cisco.com>
>>>> wrote:
>>>>> On Mar 17, 2010, at 4:39 AM, <uriz.49...@e.unavarra.es> wrote:
>>>>> 
>>>>>> Hi everyone I'm a new Open MPI user and I have just installed Open
>>>>>> MPI
>>>>>> in
>>>>>> a 6 nodes cluster with Scientific Linux. When I execute it in local
>>>>>> it
>>>>>> works perfectly, but when I try to execute it on the remote nodes
>>>>>> with
>>>>>> the
>>>>>> --host  option it hangs and gives no message. I think that the
>>>>>> problem
>>>>>> could be with the shared libraries but i'm not sure. In my opinion
>>>>>> the
>>>>>> problem is not ssh because i can access to the nodes with no password
>>>>> 
>>>>> You might want to check that Open MPI processes are actually running
>>>>> on
>>>>> the remote nodes -- check with ps if you see any "orted" or other
>>>>> MPI-related processes (e.g., your processes).
>>>>> 
>>>>> Do you have any TCP firewall software running between the nodes?  If
>>>>> so,
>>>>> you'll need to disable it (at least for Open MPI jobs).
>>>> 
>>>> I also recommend running mpirun with the option --mca btl_base_verbose
>>>> 30 to troubleshoot tcp issues.
>>>> 
>>>> In some environments, you need to explicitly tell mpirun what network
>>>> interfaces it can use to reach the hosts. Read the following FAQ
>>>> section for more information:
>>>> 
>>>> http://www.open-mpi.org/faq/?category=tcp
>>>> 
>>>> Item 7 of the FAQ might be of special interest.
>>>> 
>>>> Regards,
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to