No offense, but I would definitely advise against that path. There are
other, much simpler solutions to dynamically add hosts.

We *do* allow dynamic allocation changes - you just have to know how to do
them. Nobody asked before... ;-)  Future variations will include an even
simpler, single API solution.

I'll pass the current solution along in response to the other user's last
note.
Ralph


On 4/2/07 10:34 AM, "Jeremy Buisson" <jbuis...@irisa.fr> wrote:

> Ralph Castain a écrit :
>> The runtime underneath Open MPI (called OpenRTE) will not allow you to spawn
>> processes on nodes outside of your allocation. This is for several reasons,
>> but primarily because (a) we only know about the nodes that were allocated,
>> so we have no idea how to spawn a process anywhere else, and (b) most
>> resource managers wouldn't let us do it anyway.
>> 
>> I gather you have some node that you know about and have hard-coded into
>> your application? How do you know the name of the node if it isn't in your
>> allocation??
> 
> Because I can give that names to OpenMPI (or OpenRTE, or whatever). I
> also would like to do the same, and I don't want OpenMPI to restrict to
> what it thinks to be the allocation, while I'm sure to know better than
> it what I am doing.
> The concept of nodes being in allocations fixed at launch-time is really
> rigid; and it prevents the application (or whatever else) to modify the
> allocation at runtime, which may be quite nice.
> 
> Here is an ugly patch I've quickly done for my own use, which changes
> the round-robin rmaps such that is first allocates the hosts to the
> rmgr, as a copy&paste of some code in the dash_host ras component. It's
> far from being bugfree, but it can be a startpoint to hack.
> 
> Jeremy
> 
>> Ralph
>> 
>> 
>> On 4/2/07 10:05 AM, "Prakash Velayutham" <prakash.velayut...@cchmc.org>
>> wrote:
>> 
>>> Hello,
>>> 
>>> I have built Open MPI (1.2) with run-time environment enabled for Torque
>>> (2.1.6) resource manager. Initially I am requesting 4 nodes (1 CPU each)
>>> from Torque. The from inside of my MPI code I am trying to spawn more
>>> processes to nodes outside of Torque-assigned nodes using
>>> MPI_Comm_spawn, but this is failing with an error below:
>>> 
>>> [wins04:13564] *** An error occurred in MPI_Comm_spawn
>>> [wins04:13564] *** on communicator MPI_COMM_WORLD
>>> [wins04:13564] *** MPI_ERR_ARG: invalid argument of some other kind
>>> [wins04:13564] *** MPI_ERRORS_ARE_FATAL (goodbye)
>>> mpirun noticed that job rank 1 with PID 15070 on node wins03 exited on
>>> signal 15 (Terminated).
>>> 2 additional processes aborted (not shown)
>>> 
>>> #################################
>>> 
>>>         MPI_Info info;
>>>         MPI_Comm comm, *intercomm;
>>> ...
>>> ...
>>>         char *key, *value;
>>>         key = "host";
>>>         value = "wins08";
>>>         rc1 = MPI_Info_create(&info);
>>>         rc1 = MPI_Info_set(info, key, value);
>>>         rc1 = MPI_Comm_spawn(slave,MPI_ARGV_NULL, 1, info, 0,
>>> MPI_COMM_WORLD, intercomm, arr);
>>> ...
>>> }
>>> 
>>> ###################################################
>>> 
>>> Would this work as it is or is something wrong with my assumption? Is
>>> OpenRTE stopping me from spawning processes outside of the initially
>>> allocated nodes through Torque?
>>> 
>>> Thanks,
>>> Prakash
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> diff -ru openmpi-1.2/ompi/mca/btl/tcp/btl_tcp.c
> openmpi-1.2-custom/ompi/mca/btl/tcp/btl_tcp.c
> --- openmpi-1.2/ompi/mca/btl/tcp/btl_tcp.c 2006-11-09 19:53:44.000000000 +0100
> +++ openmpi-1.2-custom/ompi/mca/btl/tcp/btl_tcp.c 2007-03-28
> 14:02:10.000000000 +0200
> @@ -117,8 +117,8 @@
>          tcp_endpoint->endpoint_btl = tcp_btl;
>          rc = mca_btl_tcp_proc_insert(tcp_proc, tcp_endpoint);
>          if(rc != OMPI_SUCCESS) {
> -            OBJ_RELEASE(tcp_endpoint);
>              OPAL_THREAD_UNLOCK(&tcp_proc->proc_lock);
> +            OBJ_RELEASE(tcp_endpoint);
>              continue;
>          }
>  
> diff -ru openmpi-1.2/opal/threads/mutex.c
> openmpi-1.2-custom/opal/threads/mutex.c
> --- openmpi-1.2/opal/threads/mutex.c 2006-11-09 19:53:32.000000000 +0100
> +++ openmpi-1.2-custom/opal/threads/mutex.c 2007-03-28 15:59:25.000000000
> +0200
> @@ -54,6 +54,8 @@
>  #elif OMPI_ENABLE_DEBUG && OMPI_HAVE_PTHREAD_MUTEX_ERRORCHECK
>      /* set type to ERRORCHECK so that we catch recursive locks */
>      pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_ERRORCHECK);
> +#else
> +    pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
>  #endif
>      
>      pthread_mutex_init(&m->m_lock_pthread, &attr);
> diff -ru openmpi-1.2/opal/threads/mutex_unix.h
> openmpi-1.2-custom/opal/threads/mutex_unix.h
> --- openmpi-1.2/opal/threads/mutex_unix.h 2006-11-09 19:53:32.000000000 +0100
> +++ openmpi-1.2-custom/opal/threads/mutex_unix.h 2007-03-28 15:36:13.000000000
> +0200
> @@ -76,7 +76,7 @@
>  
>  static inline int opal_mutex_trylock(opal_mutex_t *m)
>  {
> -#if OMPI_ENABLE_DEBUG
> +#if 1 // OMPI_ENABLE_DEBUG
>      int ret = pthread_mutex_trylock(&m->m_lock_pthread);
>      if (ret == EDEADLK) {
>          errno = ret;
> @@ -91,7 +91,7 @@
>  
>  static inline void opal_mutex_lock(opal_mutex_t *m)
>  {
> -#if OMPI_ENABLE_DEBUG
> +#if 1 // OMPI_ENABLE_DEBUG
>      int ret = pthread_mutex_lock(&m->m_lock_pthread);
>      if (ret == EDEADLK) {
>          errno = ret;
> diff -ru openmpi-1.2/opal/util/stacktrace.c
> openmpi-1.2-custom/opal/util/stacktrace.c
> --- openmpi-1.2/opal/util/stacktrace.c 2007-01-24 19:16:07.000000000 +0100
> +++ openmpi-1.2-custom/opal/util/stacktrace.c 2007-03-28 14:02:10.000000000
> +0200
> @@ -344,6 +344,8 @@
>                     stacktrace_hostname, getpid());
>      write(fileno(stderr), print_buffer, ret);
>      fflush(stderr);
> +    for(;;)
> +      pause();
>  }
>  
>  #endif /* OMPI_WANT_PRETTY_PRINT_STACKTRACE && ! defined(__WINDOWS__) */
> diff -ru openmpi-1.2/orte/mca/rmaps/round_robin/rmaps_rr.c
> openmpi-1.2-custom/orte/mca/rmaps/round_robin/rmaps_rr.c
> --- openmpi-1.2/orte/mca/rmaps/round_robin/rmaps_rr.c 2007-01-24
> 19:16:10.000000000 +0100
> +++ openmpi-1.2-custom/orte/mca/rmaps/round_robin/rmaps_rr.c 2007-03-28
> 15:11:57.000000000 +0200
> @@ -265,6 +265,134 @@
>  
>      return ORTE_SUCCESS;
>  }
> +
> +static bool orte_rmaps_rr_is_host_allocated(char* name)
> +{
> +  orte_ras_node_t* node;
> +  node = orte_ras_base_node_lookup(0, name);
> +  OBJ_RELEASE(node);
> +  return node != NULL;
> +}
> +
> +static int orte_rmaps_rr_host_allocate(orte_jobid_t jobid)
> +{
> +    opal_list_t nodes;
> +    opal_list_item_t* item;
> +    orte_app_context_t **context;
> +    size_t i, j, k;
> +    orte_std_cntr_t num_context = 0;
> +    int rc;
> +    char **mapped_nodes = NULL, **mini_map;
> +    orte_ras_node_t *node;
> +
> +    /* get the context */
> +
> +    rc = orte_rmgr.get_app_context(jobid, &context, &num_context);
> +    if (ORTE_SUCCESS != rc) {
> +        ORTE_ERROR_LOG(rc);
> +        return rc;
> +    }
> +    OBJ_CONSTRUCT(&nodes, opal_list_t);
> +
> +    /* If there's nothing to do, skip to the end */
> +
> +    if (0 == num_context) {
> +        rc = ORTE_SUCCESS;
> +        goto cleanup;
> +    }
> +
> +    /* Otherwise, go through the contexts */
> +
> +    for (i = 0; i < num_context; ++i) {
> +      if (context[i] != 0) {
> +        if (context[i]->num_map > 0) {
> +            orte_app_context_map_t** map = context[i]->map_data;
> +
> +            /* Accumulate all of the host name mappings */
> +            for (j = 0; j < context[i]->num_map; ++j) {
> +                if (ORTE_APP_CONTEXT_MAP_HOSTNAME == map[j]->map_type) {
> +                    mini_map = opal_argv_split(map[j]->map_data, ',');
> +      for (k = 0; NULL != mini_map[k]; ++k) {
> +        if(!orte_rmaps_rr_is_host_allocated(mini_map[k]))
> +   {
> +     rc = opal_argv_append_nosize(&mapped_nodes,
> +             mini_map[k]);
> +     if (OPAL_SUCCESS != rc) {
> +       goto cleanup;
> +     }
> +   }
> +                    }
> +      opal_argv_free(mini_map);
> +                }
> +            }
> +        }
> +      }
> +    }
> +
> +    /* Did we find anything? */
> +
> +    if (NULL != mapped_nodes) {
> +
> +        /* Go through the names found and add them to the host list.
> +           If they're not unique, then bump the slots count for each
> +           duplicate */
> +
> +        for (i = 0; NULL != mapped_nodes[i]; ++i) {
> +            for (item = opal_list_get_first(&nodes);
> +                 item != opal_list_get_end(&nodes);
> +                 item = opal_list_get_next(item)) {
> +                node = (orte_ras_node_t*) item;
> +                if (0 == strcmp(node->node_name, mapped_nodes[i])) {
> +                    ++node->node_slots;
> +                    break;
> +                }
> +            }
> +
> +            /* If we didn't find it, add it to the list */
> +
> +            if (item == opal_list_get_end(&nodes)) {
> +                node = OBJ_NEW(orte_ras_node_t);
> +                if (NULL == node) {
> +                    return ORTE_ERR_OUT_OF_RESOURCE;
> +                }
> +                node->node_name = strdup(mapped_nodes[i]);
> +                node->node_arch = NULL;
> +                node->node_state = ORTE_NODE_STATE_UP;
> +                /* JMS: this should not be hard-wired to 0, but there's no
> +                   other value to put it to [yet]... */
> +                node->node_cellid = 0;
> +                node->node_slots_inuse = 0;
> +                node->node_slots_max = 0;
> +                node->node_slots = 1;
> +                opal_list_append(&nodes, &node->super);
> +            }
> +        }
> +
> +        /* Put them on the segment and allocate them */
> +
> +        if (ORTE_SUCCESS !=
> +            (rc = orte_ras_base_node_insert(&nodes)) ||
> +            ORTE_SUCCESS !=
> +            (rc = orte_ras_base_allocate_nodes(jobid, &nodes))) {
> +            goto cleanup;
> +        }
> +    }
> +
> +cleanup:
> +    if (NULL != mapped_nodes) {
> +        opal_argv_free(mapped_nodes);
> +    }
> +
> +    while (NULL != (item = opal_list_remove_first(&nodes))) {
> +        OBJ_RELEASE(item);
> +    }
> +    OBJ_DESTRUCT(&nodes);
> +    for (i = 0; i < num_context; i++) {
> +        OBJ_RELEASE(context[i]);
> +    }
> +    free(context);
> +    return rc;
> +}
>     
>  
>  /*
> @@ -367,6 +495,11 @@
>      orte_attribute_t *attr;
>      orte_std_cntr_t slots_per_node;
>  
> +    if(ORTE_SUCCESS != (rc = orte_rmaps_rr_host_allocate(jobid))) {
> +      ORTE_ERROR_LOG(rc);
> +      return rc;
> +    }
> +
>      OPAL_TRACE(1);
>      
>      /* setup the local environment from the attributes */
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Reply via email to