The "updated"field in the orte_job_t structure is only used to help reduce the
size of the launch message sent to all the daemons. Basically, we only include
info on jobs that have been changed - thus, it only gets used when the app
calls comm_spawn. After every launch, we automatically change i
I confirmed your fix worked good for me. But, I guess at least
we should add the line "daemons->updated = false;" in the last
if-clause, although I'm not sure how the variable is used.
Is it okay, Ralph?
Tetsuya
> Understood, and your logic is correct. It's just that I'd rather each
launcher de
Understood, and your logic is correct. It's just that I'd rather each launcher
decide to declare the daemons as reported rather than doing it in the common
code, just in case someone writes a launcher where they choose to respond
differently to the case where no new daemons need to be launched.
I do not understand your fix yet, but it would be better, I guess.
I'll check it later, but now please let me expalin what I thought:
If some nodes are allocated, it doen't go through this part because
opal_list_get_size(&nodes) > 0 at this location.
1590if (0 == opal_list_get_size(&nodes)
Hmm...no, I don't think that's the correct patch. We want that function to
remain "clean" as it's job is simply to construct the list of nodes for the VM.
It's the responsibility of the launcher to decide what to do with it.
Please see https://svn.open-mpi.org/trac/ompi/ticket/4408 for a fix
Ra
Hi Ralph, I found another corner case hangup in openmpi-1.7.5rc3.
Condition:
1. allocate some nodes using RM such as TORQUE.
2. request the head node only in executing the job with
-host or -hostfile option.
Example:
1. allocate node05,node06 using TORQUE.
2. request node05 only with -host op