Thanks for the info, Ralph. It is as I thought, but was hoping wouldn't
be that way.
I am requesting more nodes from the resource manager from inside of my
application code using the RM's API. when I know they are available
(allocated by the RM), I am trying to split the application data across
the newly allocated nodes from inside of MPI.

Any ideas?

Prakash

>>> r...@lanl.gov 04/02/07 12:11 PM >>>
The runtime underneath Open MPI (called OpenRTE) will not allow you to
spawn
processes on nodes outside of your allocation. This is for several
reasons,
but primarily because (a) we only know about the nodes that were
allocated,
so we have no idea how to spawn a process anywhere else, and (b) most
resource managers wouldn't let us do it anyway.

I gather you have some node that you know about and have hard-coded into
your application? How do you know the name of the node if it isn't in
your
allocation??

Ralph


On 4/2/07 10:05 AM, "Prakash Velayutham" <prakash.velayut...@cchmc.org>
wrote:

> Hello,
> 
> I have built Open MPI (1.2) with run-time environment enabled for
Torque
> (2.1.6) resource manager. Initially I am requesting 4 nodes (1 CPU
each)
> from Torque. The from inside of my MPI code I am trying to spawn more
> processes to nodes outside of Torque-assigned nodes using
> MPI_Comm_spawn, but this is failing with an error below:
> 
> [wins04:13564] *** An error occurred in MPI_Comm_spawn
> [wins04:13564] *** on communicator MPI_COMM_WORLD
> [wins04:13564] *** MPI_ERR_ARG: invalid argument of some other kind
> [wins04:13564] *** MPI_ERRORS_ARE_FATAL (goodbye)
> mpirun noticed that job rank 1 with PID 15070 on node wins03 exited on
> signal 15 (Terminated).
> 2 additional processes aborted (not shown)
> 
> #################################
> 
>         MPI_Info info;
>         MPI_Comm comm, *intercomm;
> ...
> ...
>         char *key, *value;
>         key = "host";
>         value = "wins08";
>         rc1 = MPI_Info_create(&info);
>         rc1 = MPI_Info_set(info, key, value);
>         rc1 = MPI_Comm_spawn(slave,MPI_ARGV_NULL, 1, info, 0,
> MPI_COMM_WORLD, intercomm, arr);
> ...
> }
> 
> ###################################################
> 
> Would this work as it is or is something wrong with my assumption? Is
> OpenRTE stopping me from spawning processes outside of the initially
> allocated nodes through Torque?
> 
> Thanks,
> Prakash
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to