Hello,

I am attempting to use the openmpi development master for a code that uses
dynamic process management (i.e. MPI_Comm_spawn) on our Cray XC40 at the
Army Research Laboratory. After reading through the mailing list I came to
the conclusion that the master branch is the only hope for getting this to
work on the newer Cray machines.

To test I am using the cpi-master.c cpi-worker.c example. The test works
when executing on a small number of processors, five or less, but begins to
fail with segmentation faults in orted when using more processors. Even with
five or fewer processors, I am spreading the computation to more than one
node. I am using the cray ugni btl through the alps scheduler.

I get a core file from orted and have the seg fault tracked down to
pmix_server_process_msgs.c:420 where req->proxy is NULL. I have tried
reading the code to understand how this happens, but am unsure. I do see
that in the if statement where I take the else branch, the other branch
specifically checks "if (NULL == req->proxy)" - however, no such check is
done the the else branch.

I have debug output dumped for the failing runs. I can provide the output
along with ompi_info output and config.log to anyone who is interested.

- Ken Leiter

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to