Okay, so we see two nodes have been allocated: 1. JARVICENAE27 - appears to be the node where mpirun is running
2. 10.3.0.176 Does that match what you expected? If you cannot ssh (without a password) between machines, then we will not be able to run. > On May 13, 2015, at 12:21 AM, Rahul Yadav <robora...@gmail.com> wrote: > > I get following output with verbose > > [JARVICENAE27:00654] mca: base: components_register: registering ras > components > [JARVICENAE27:00654] mca: base: components_register: found loaded component > loadleveler > [JARVICENAE27:00654] mca: base: components_register: component loadleveler > register function successful > [JARVICENAE27:00654] mca: base: components_register: found loaded component > simulator > [JARVICENAE27:00654] mca: base: components_register: component simulator > register function successful > [JARVICENAE27:00654] mca: base: components_register: found loaded component > slurm > [JARVICENAE27:00654] mca: base: components_register: component slurm register > function successful > [JARVICENAE27:00654] mca: base: components_open: opening ras components > [JARVICENAE27:00654] mca: base: components_open: found loaded component > loadleveler > [JARVICENAE27:00654] mca: base: components_open: component loadleveler open > function successful > [JARVICENAE27:00654] mca: base: components_open: found loaded component > simulator > [JARVICENAE27:00654] mca: base: components_open: found loaded component slurm > [JARVICENAE27:00654] mca: base: components_open: component slurm open > function successful > [JARVICENAE27:00654] mca:base:select: Auto-selecting ras components > [JARVICENAE27:00654] mca:base:select:( ras) Querying component [loadleveler] > [JARVICENAE27:00654] mca:base:select:( ras) Skipping component > [loadleveler]. Query failed to return a module > [JARVICENAE27:00654] mca:base:select:( ras) Querying component [simulator] > [JARVICENAE27:00654] mca:base:select:( ras) Skipping component [simulator]. > Query failed to return a module > [JARVICENAE27:00654] mca:base:select:( ras) Querying component [slurm] > [JARVICENAE27:00654] mca:base:select:( ras) Skipping component [slurm]. > Query failed to return a module > [JARVICENAE27:00654] mca:base:select:( ras) No component selected! > > ====================== ALLOCATED NODES ====================== > JARVICENAE27: slots=1 max_slots=0 slots_inuse=0 state=UP > 10.3.0.176 <http://10.3.0.176/>: slots=1 max_slots=0 slots_inuse=0 > state=UNKNOWN > > Also, I am not able to ssh to other machine from one machine in chroot > environment. Can that be a problem ? > > Thanks > Rahul > > On Thu, May 7, 2015 at 8:06 AM, Ralph Castain <r...@open-mpi.org > <mailto:r...@open-mpi.org>> wrote: > Try adding —mca ras_base_verbose 10 to your cmd line and let’s see what it > thinks it is doing. Which OMPI version are you using - master? > > >> On May 6, 2015, at 11:24 PM, Rahul Yadav <robora...@gmail.com >> <mailto:robora...@gmail.com>> wrote: >> >> Hi, >> >> We have been trying to run MPI jobs (consisting of two different binaries, >> one each ) in two nodes, using hostfile option as following >> >> mpirun --allow-run-as-root --mca pml yalla -n 1 --hostfile /root/host1 >> /root/app2 : -n 1 --hostfile /root/host2 /root/backend >> >> We are doing this in chroot environment. We have set the HPCX env in >> chroot'ed environment itself. /root/host1 and /root/host2 (inside chroot >> env) contains IPs of two nodes respectively. >> >> We are getting following error >> >> " all nodes which are allocated for this job are already filled " >> >> However when we use chroot but don't use hostfile option (both processes run >> in same node) OR we use hostfile option but outside chroot, it works. >> >> Anyone has any idea if chroot can cause above error and how to solve it ? >> >> Thanks >> Rahul >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/05/26845.php >> <http://www.open-mpi.org/community/lists/users/2015/05/26845.php> > > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/05/26847.php > <http://www.open-mpi.org/community/lists/users/2015/05/26847.php> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/05/26860.php