I will try on a non-cray machine as well.

- Ken

-----Original Message-----
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Howard Pritchard
Sent: Thursday, June 11, 2015 12:21 PM
To: Open MPI Users
Subject: Re: [OMPI users] orted segmentation fault in pmix on master

Hello Ken,

Could you give the details of the allocation request (qsub args) as well as the 
mpirun command line args? I'm trying to reproduce on the nersc system.

It would be interesting if you have access to a similar size non-cray cluster 
if you get the same problems. 

Howard


2015-06-11 9:13 GMT-06:00 Ralph Castain <r...@open-mpi.org 
<mailto:r...@open-mpi.org> >:


        I don’t have a Cray, but let me see if I can reproduce this on 
something else
        
        > On Jun 11, 2015, at 7:26 AM, Leiter, Kenneth W CIV USARMY ARL (US) 
<kenneth.w.leiter2....@mail.mil <mailto:kenneth.w.leiter2....@mail.mil> > wrote:
        >
        > Hello,
        >
        > I am attempting to use the openmpi development master for a code that 
uses
        > dynamic process management (i.e. MPI_Comm_spawn) on our Cray XC40 at 
the
        > Army Research Laboratory. After reading through the mailing list I 
came to
        > the conclusion that the master branch is the only hope for getting 
this to
        > work on the newer Cray machines.
        >
        > To test I am using the cpi-master.c cpi-worker.c example. The test 
works
        > when executing on a small number of processors, five or less, but 
begins to
        > fail with segmentation faults in orted when using more processors. 
Even with
        > five or fewer processors, I am spreading the computation to more than 
one
        > node. I am using the cray ugni btl through the alps scheduler.
        >
        > I get a core file from orted and have the seg fault tracked down to
        > pmix_server_process_msgs.c:420 where req->proxy is NULL. I have tried
        > reading the code to understand how this happens, but am unsure. I do 
see
        > that in the if statement where I take the else branch, the other 
branch
        > specifically checks "if (NULL == req->proxy)" - however, no such 
check is
        > done the the else branch.
        >
        > I have debug output dumped for the failing runs. I can provide the 
output
        > along with ompi_info output and config.log to anyone who is 
interested.
        >
        > - Ken Leiter
        >
        > _______________________________________________
        > users mailing list
        > us...@open-mpi.org <mailto:us...@open-mpi.org> 
        > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
        > Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/06/27094.php
        
        _______________________________________________
        users mailing list
        us...@open-mpi.org <mailto:us...@open-mpi.org> 
        Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
        Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/06/27095.php


Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to