Re: [OMPI users] Error: "all nodes which are allocated for this job are already filled"

2015-05-13 Thread Rahul Yadav
I get following output with verbose

[JARVICENAE27:00654] mca: base: components_register: registering ras
components
[JARVICENAE27:00654] mca: base: components_register: found loaded component
loadleveler
[JARVICENAE27:00654] mca: base: components_register: component loadleveler
register function successful
[JARVICENAE27:00654] mca: base: components_register: found loaded component
simulator
[JARVICENAE27:00654] mca: base: components_register: component simulator
register function successful
[JARVICENAE27:00654] mca: base: components_register: found loaded component
slurm
[JARVICENAE27:00654] mca: base: components_register: component slurm
register function successful
[JARVICENAE27:00654] mca: base: components_open: opening ras components
[JARVICENAE27:00654] mca: base: components_open: found loaded component
loadleveler
[JARVICENAE27:00654] mca: base: components_open: component loadleveler open
function successful
[JARVICENAE27:00654] mca: base: components_open: found loaded component
simulator
[JARVICENAE27:00654] mca: base: components_open: found loaded component
slurm
[JARVICENAE27:00654] mca: base: components_open: component slurm open
function successful
[JARVICENAE27:00654] mca:base:select: Auto-selecting ras components
[JARVICENAE27:00654] mca:base:select:(  ras) Querying component
[loadleveler]
[JARVICENAE27:00654] mca:base:select:(  ras) Skipping component
[loadleveler]. Query failed to return a module
[JARVICENAE27:00654] mca:base:select:(  ras) Querying component [simulator]
[JARVICENAE27:00654] mca:base:select:(  ras) Skipping component
[simulator]. Query failed to return a module
[JARVICENAE27:00654] mca:base:select:(  ras) Querying component [slurm]
[JARVICENAE27:00654] mca:base:select:(  ras) Skipping component [slurm].
Query failed to return a module
[JARVICENAE27:00654] mca:base:select:(  ras) No component selected!

==   ALLOCATED NODES   ==
   JARVICENAE27: slots=1 max_slots=0 slots_inuse=0 state=UP
   10.3.0.176: slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN

Also, I am not able to ssh to other machine from one machine in chroot
environment. Can that be a problem ?

Thanks
Rahul

On Thu, May 7, 2015 at 8:06 AM, Ralph Castain  wrote:

> Try adding —mca ras_base_verbose 10 to your cmd line and let’s see what it
> thinks it is doing. Which OMPI version are you using - master?
>
>
> On May 6, 2015, at 11:24 PM, Rahul Yadav  wrote:
>
> Hi,
>
> We have been trying to run MPI jobs (consisting of two different binaries,
> one each ) in two nodes,  using hostfile option as following
>
> mpirun --allow-run-as-root --mca pml yalla -n 1 --hostfile /root/host1
> /root/app2 : -n 1 --hostfile /root/host2 /root/backend
>
> We are doing this in chroot environment. We have set the HPCX env in
> chroot'ed environment itself. /root/host1 and /root/host2 (inside chroot
> env) contains IPs of two nodes respectively.
>
> We are getting following error
>
> " all nodes which are allocated for this job are already filled "
>
> However when we use chroot but don't use hostfile option (both processes
> run in same node) OR we use hostfile option but outside chroot, it works.
>
> Anyone has any idea if chroot can cause above error and how to solve it ?
>
> Thanks
> Rahul
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/05/26845.php
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/05/26847.php
>


Re: [OMPI users] Error: "all nodes which are allocated for this job are already filled"

2015-05-13 Thread Ralph Castain
Okay, so we see two nodes have been allocated:

1. JARVICENAE27 - appears to be the node where mpirun is running

2. 10.3.0.176

Does that match what you expected?

If you cannot ssh (without a password) between machines, then we will not be 
able to run.


> On May 13, 2015, at 12:21 AM, Rahul Yadav  wrote:
> 
> I get following output with verbose
> 
> [JARVICENAE27:00654] mca: base: components_register: registering ras 
> components
> [JARVICENAE27:00654] mca: base: components_register: found loaded component 
> loadleveler
> [JARVICENAE27:00654] mca: base: components_register: component loadleveler 
> register function successful
> [JARVICENAE27:00654] mca: base: components_register: found loaded component 
> simulator
> [JARVICENAE27:00654] mca: base: components_register: component simulator 
> register function successful
> [JARVICENAE27:00654] mca: base: components_register: found loaded component 
> slurm
> [JARVICENAE27:00654] mca: base: components_register: component slurm register 
> function successful
> [JARVICENAE27:00654] mca: base: components_open: opening ras components
> [JARVICENAE27:00654] mca: base: components_open: found loaded component 
> loadleveler
> [JARVICENAE27:00654] mca: base: components_open: component loadleveler open 
> function successful
> [JARVICENAE27:00654] mca: base: components_open: found loaded component 
> simulator
> [JARVICENAE27:00654] mca: base: components_open: found loaded component slurm
> [JARVICENAE27:00654] mca: base: components_open: component slurm open 
> function successful
> [JARVICENAE27:00654] mca:base:select: Auto-selecting ras components
> [JARVICENAE27:00654] mca:base:select:(  ras) Querying component [loadleveler]
> [JARVICENAE27:00654] mca:base:select:(  ras) Skipping component 
> [loadleveler]. Query failed to return a module
> [JARVICENAE27:00654] mca:base:select:(  ras) Querying component [simulator]
> [JARVICENAE27:00654] mca:base:select:(  ras) Skipping component [simulator]. 
> Query failed to return a module
> [JARVICENAE27:00654] mca:base:select:(  ras) Querying component [slurm]
> [JARVICENAE27:00654] mca:base:select:(  ras) Skipping component [slurm]. 
> Query failed to return a module
> [JARVICENAE27:00654] mca:base:select:(  ras) No component selected!
> 
> ==   ALLOCATED NODES   ==
>JARVICENAE27: slots=1 max_slots=0 slots_inuse=0 state=UP
>10.3.0.176 : slots=1 max_slots=0 slots_inuse=0 
> state=UNKNOWN
> 
> Also, I am not able to ssh to other machine from one machine in chroot 
> environment. Can that be a problem ?
> 
> Thanks
> Rahul
> 
> On Thu, May 7, 2015 at 8:06 AM, Ralph Castain  > wrote:
> Try adding —mca ras_base_verbose 10 to your cmd line and let’s see what it 
> thinks it is doing. Which OMPI version are you using - master?
> 
> 
>> On May 6, 2015, at 11:24 PM, Rahul Yadav > > wrote:
>> 
>> Hi,
>> 
>> We have been trying to run MPI jobs (consisting of two different binaries, 
>> one each ) in two nodes,  using hostfile option as following
>> 
>> mpirun --allow-run-as-root --mca pml yalla -n 1 --hostfile /root/host1 
>> /root/app2 : -n 1 --hostfile /root/host2 /root/backend
>> 
>> We are doing this in chroot environment. We have set the HPCX env in 
>> chroot'ed environment itself. /root/host1 and /root/host2 (inside chroot 
>> env) contains IPs of two nodes respectively.
>> 
>> We are getting following error
>> 
>> " all nodes which are allocated for this job are already filled "
>> 
>> However when we use chroot but don't use hostfile option (both processes run 
>> in same node) OR we use hostfile option but outside chroot, it works.
>> 
>> Anyone has any idea if chroot can cause above error and how to solve it ?
>> 
>> Thanks
>> Rahul
>> ___
>> users mailing list
>> us...@open-mpi.org 
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> 
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/05/26845.php 
>> 
> 
> ___
> users mailing list
> us...@open-mpi.org 
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> 
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/05/26847.php 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/05/26860.php



Re: [OMPI users] OpenMPI on Windows without Cygwin

2015-05-13 Thread Damien

Walt,

I don't remember seeing a response to this.  OpenMPI isn't supported on 
native Windows anymore.  The last version for Windows was the 1.6 series.


Damien

On 2015-05-11 3:07 PM, Walt Brainerd wrote:

Is it possible to build OpenMPI for Windows
not running Cygwin?

I know it uses /dev/shm, so there would have to
be something equivalent to that not in Cygwin.

TIA.

--
Walt Brainerd


___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/05/26855.php




Re: [OMPI users] OpenMPI on Windows without Cygwin

2015-05-13 Thread Walt Brainerd
No, I hadn't received any response.
That is too bad.
Knowing that earlier would have saved some hours.

Some day I'll look again at extracting some set of stuff
from Cygwin that will make it work. Maybe even that
is not possible. But Cygwin is huge. OTOH, maybe anybody
who is contemplating using Coarrays would be somebody
who has Cygwin anyway.

On Wed, May 13, 2015 at 8:55 AM, Damien  wrote:

>  Walt,
>
> I don't remember seeing a response to this.  OpenMPI isn't supported on
> native Windows anymore.  The last version for Windows was the 1.6 series.
>
> Damien
>
>
> On 2015-05-11 3:07 PM, Walt Brainerd wrote:
>
> Is it possible to build OpenMPI for Windows
> not running Cygwin?
>
>  I know it uses /dev/shm, so there would have to
> be something equivalent to that not in Cygwin.
>
>  TIA.
>
>  --
> Walt Brainerd
>
>
> ___
> users mailing listus...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/05/26855.php
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/05/26862.php
>



-- 
Walt Brainerd


Re: [OMPI users] OpenMPI on Windows without Cygwin

2015-05-13 Thread Damien

Depending on what you need, the old 1.6 version might still work.

Damien

On 2015-05-13 2:19 PM, Walt Brainerd wrote:

No, I hadn't received any response.
That is too bad.
Knowing that earlier would have saved some hours.

Some day I'll look again at extracting some set of stuff
from Cygwin that will make it work. Maybe even that
is not possible. But Cygwin is huge. OTOH, maybe anybody
who is contemplating using Coarrays would be somebody
who has Cygwin anyway.

On Wed, May 13, 2015 at 8:55 AM, Damien > wrote:


Walt,

I don't remember seeing a response to this.  OpenMPI isn't
supported on native Windows anymore.  The last version for Windows
was the 1.6 series.

Damien


On 2015-05-11 3:07 PM, Walt Brainerd wrote:

Is it possible to build OpenMPI for Windows
not running Cygwin?

I know it uses /dev/shm, so there would have to
be something equivalent to that not in Cygwin.

TIA.

-- 
Walt Brainerd



___
users mailing list
us...@open-mpi.org  
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/05/26855.php



___
users mailing list
us...@open-mpi.org 
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/05/26862.php




--
Walt Brainerd


___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/05/26863.php