Okay, so we see two nodes have been allocated:

1. JARVICENAE27 - appears to be the node where mpirun is running

2. 10.3.0.176

Does that match what you expected?

If you cannot ssh (without a password) between machines, then we will not be 
able to run.


> On May 13, 2015, at 12:21 AM, Rahul Yadav <robora...@gmail.com> wrote:
> 
> I get following output with verbose
> 
> [JARVICENAE27:00654] mca: base: components_register: registering ras 
> components
> [JARVICENAE27:00654] mca: base: components_register: found loaded component 
> loadleveler
> [JARVICENAE27:00654] mca: base: components_register: component loadleveler 
> register function successful
> [JARVICENAE27:00654] mca: base: components_register: found loaded component 
> simulator
> [JARVICENAE27:00654] mca: base: components_register: component simulator 
> register function successful
> [JARVICENAE27:00654] mca: base: components_register: found loaded component 
> slurm
> [JARVICENAE27:00654] mca: base: components_register: component slurm register 
> function successful
> [JARVICENAE27:00654] mca: base: components_open: opening ras components
> [JARVICENAE27:00654] mca: base: components_open: found loaded component 
> loadleveler
> [JARVICENAE27:00654] mca: base: components_open: component loadleveler open 
> function successful
> [JARVICENAE27:00654] mca: base: components_open: found loaded component 
> simulator
> [JARVICENAE27:00654] mca: base: components_open: found loaded component slurm
> [JARVICENAE27:00654] mca: base: components_open: component slurm open 
> function successful
> [JARVICENAE27:00654] mca:base:select: Auto-selecting ras components
> [JARVICENAE27:00654] mca:base:select:(  ras) Querying component [loadleveler]
> [JARVICENAE27:00654] mca:base:select:(  ras) Skipping component 
> [loadleveler]. Query failed to return a module
> [JARVICENAE27:00654] mca:base:select:(  ras) Querying component [simulator]
> [JARVICENAE27:00654] mca:base:select:(  ras) Skipping component [simulator]. 
> Query failed to return a module
> [JARVICENAE27:00654] mca:base:select:(  ras) Querying component [slurm]
> [JARVICENAE27:00654] mca:base:select:(  ras) Skipping component [slurm]. 
> Query failed to return a module
> [JARVICENAE27:00654] mca:base:select:(  ras) No component selected!
> 
> ======================   ALLOCATED NODES   ======================
>        JARVICENAE27: slots=1 max_slots=0 slots_inuse=0 state=UP
>        10.3.0.176 <http://10.3.0.176/>: slots=1 max_slots=0 slots_inuse=0 
> state=UNKNOWN
> 
> Also, I am not able to ssh to other machine from one machine in chroot 
> environment. Can that be a problem ?
> 
> Thanks
> Rahul
> 
> On Thu, May 7, 2015 at 8:06 AM, Ralph Castain <r...@open-mpi.org 
> <mailto:r...@open-mpi.org>> wrote:
> Try adding —mca ras_base_verbose 10 to your cmd line and let’s see what it 
> thinks it is doing. Which OMPI version are you using - master?
> 
> 
>> On May 6, 2015, at 11:24 PM, Rahul Yadav <robora...@gmail.com 
>> <mailto:robora...@gmail.com>> wrote:
>> 
>> Hi,
>> 
>> We have been trying to run MPI jobs (consisting of two different binaries, 
>> one each ) in two nodes,  using hostfile option as following
>> 
>> mpirun --allow-run-as-root --mca pml yalla -n 1 --hostfile /root/host1 
>> /root/app2 : -n 1 --hostfile /root/host2 /root/backend
>> 
>> We are doing this in chroot environment. We have set the HPCX env in 
>> chroot'ed environment itself. /root/host1 and /root/host2 (inside chroot 
>> env) contains IPs of two nodes respectively.
>> 
>> We are getting following error
>> 
>> " all nodes which are allocated for this job are already filled "
>> 
>> However when we use chroot but don't use hostfile option (both processes run 
>> in same node) OR we use hostfile option but outside chroot, it works.
>> 
>> Anyone has any idea if chroot can cause above error and how to solve it ?
>> 
>> Thanks
>> Rahul
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/05/26845.php 
>> <http://www.open-mpi.org/community/lists/users/2015/05/26845.php>
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/05/26847.php 
> <http://www.open-mpi.org/community/lists/users/2015/05/26847.php>
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/05/26860.php

Reply via email to