Won't help him. aprun does not support dynamics.

-Nathan

On Thu, Jun 11, 2015 at 12:43:38PM -0400, Joshua Ladd wrote:
>    Ken,
> 
>    Could you try to launch the job with aprun instead of mpirun?
> 
>    Thanks,
> 
>    Josh
>    On Thu, Jun 11, 2015 at 12:21 PM, Howard Pritchard <hpprit...@gmail.com>
>    wrote:
> 
>      Hello Ken,
>      Could you give the details of the allocation request (qsub args)
>      as well as the mpirun command line args? I'm trying to reproduce
>      on the nersc system.
>      It would be interesting if you have access to a similar size non-cray
>      cluster if you get the same problems. 
>      Howard
>      2015-06-11 9:13 GMT-06:00 Ralph Castain <r...@open-mpi.org>:
> 
>        I don't have a Cray, but let me see if I can reproduce this on
>        something else
> 
>        > On Jun 11, 2015, at 7:26 AM, Leiter, Kenneth W CIV USARMY ARL (US)
>        <kenneth.w.leiter2....@mail.mil> wrote:
>        >
>        > Hello,
>        >
>        > I am attempting to use the openmpi development master for a code
>        that uses
>        > dynamic process management (i.e. MPI_Comm_spawn) on our Cray XC40 at
>        the
>        > Army Research Laboratory. After reading through the mailing list I
>        came to
>        > the conclusion that the master branch is the only hope for getting
>        this to
>        > work on the newer Cray machines.
>        >
>        > To test I am using the cpi-master.c cpi-worker.c example. The test
>        works
>        > when executing on a small number of processors, five or less, but
>        begins to
>        > fail with segmentation faults in orted when using more processors.
>        Even with
>        > five or fewer processors, I am spreading the computation to more
>        than one
>        > node. I am using the cray ugni btl through the alps scheduler.
>        >
>        > I get a core file from orted and have the seg fault tracked down to
>        > pmix_server_process_msgs.c:420 where req->proxy is NULL. I have
>        tried
>        > reading the code to understand how this happens, but am unsure. I do
>        see
>        > that in the if statement where I take the else branch, the other
>        branch
>        > specifically checks "if (NULL == req->proxy)" - however, no such
>        check is
>        > done the the else branch.
>        >
>        > I have debug output dumped for the failing runs. I can provide the
>        output
>        > along with ompi_info output and config.log to anyone who is
>        interested.
>        >
>        > - Ken Leiter
>        >
>        > _______________________________________________
>        > users mailing list
>        > us...@open-mpi.org
>        > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>        > Link to this post:
>        http://www.open-mpi.org/community/lists/users/2015/06/27094.php
> 
>        _______________________________________________
>        users mailing list
>        us...@open-mpi.org
>        Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>        Link to this post:
>        http://www.open-mpi.org/community/lists/users/2015/06/27095.php
> 
>      _______________________________________________
>      users mailing list
>      us...@open-mpi.org
>      Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>      Link to this post:
>      http://www.open-mpi.org/community/lists/users/2015/06/27098.php

> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/06/27100.php

Attachment: pgp_CNz1DVF_k.pgp
Description: PGP signature

Reply via email to