Ken,

Why are the qsub ncpus and mpiirun -np different values.

Doug
On Jun 11, 2015, at 12:08 PM, Leiter, Kenneth W CIV USARMY ARL (US) 
<kenneth.w.leiter2....@mail.mil> wrote:

> Hi Howard,
> 
> My qsub command is:
> qsub -l select=10:ncpus=32:mpiprocs=32 -q debug -l walltime=01:00:00 -I
> 
> I have also tried using ccm mode with no change in outcome.
> 
> My mpirun command is:
> mpirun -np 9 -debug-daemons  ./parent child
> 
> I have also attached the debug output for the particular daemon that crashes 
> to this message.
> 
> I have access to a few other cray machines I can try this on, an XE6 and XC30.
> 
> - Ken Leiter
> 
> -----Original Message-----
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Howard Pritchard
> Sent: Thursday, June 11, 2015 12:21 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] orted segmentation fault in pmix on master
> 
> Hello Ken,
> 
> Could you give the details of the allocation request (qsub args) as well as 
> the mpirun command line args? I'm trying to reproduce on the nersc system.
> 
> It would be interesting if you have access to a similar size non-cray cluster 
> if you get the same problems. 
> 
> Howard
> 
> 
> 2015-06-11 9:13 GMT-06:00 Ralph Castain <r...@open-mpi.org 
> <mailto:r...@open-mpi.org> >:
> 
> 
>       I don’t have a Cray, but let me see if I can reproduce this on 
> something else
>       
>       > On Jun 11, 2015, at 7:26 AM, Leiter, Kenneth W CIV USARMY ARL (US) 
> <kenneth.w.leiter2....@mail.mil <mailto:kenneth.w.leiter2....@mail.mil> > 
> wrote:
>       >
>       > Hello,
>       >
>       > I am attempting to use the openmpi development master for a code that 
> uses
>       > dynamic process management (i.e. MPI_Comm_spawn) on our Cray XC40 at 
> the
>       > Army Research Laboratory. After reading through the mailing list I 
> came to
>       > the conclusion that the master branch is the only hope for getting 
> this to
>       > work on the newer Cray machines.
>       >
>       > To test I am using the cpi-master.c cpi-worker.c example. The test 
> works
>       > when executing on a small number of processors, five or less, but 
> begins to
>       > fail with segmentation faults in orted when using more processors. 
> Even with
>       > five or fewer processors, I am spreading the computation to more than 
> one
>       > node. I am using the cray ugni btl through the alps scheduler.
>       >
>       > I get a core file from orted and have the seg fault tracked down to
>       > pmix_server_process_msgs.c:420 where req->proxy is NULL. I have tried
>       > reading the code to understand how this happens, but am unsure. I do 
> see
>       > that in the if statement where I take the else branch, the other 
> branch
>       > specifically checks "if (NULL == req->proxy)" - however, no such 
> check is
>       > done the the else branch.
>       >
>       > I have debug output dumped for the failing runs. I can provide the 
> output
>       > along with ompi_info output and config.log to anyone who is 
> interested.
>       >
>       > - Ken Leiter
>       >
>       > _______________________________________________
>       > users mailing list
>       > us...@open-mpi.org <mailto:us...@open-mpi.org> 
>       > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>       > Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/06/27094.php
>       
>       _______________________________________________
>       users mailing list
>       us...@open-mpi.org <mailto:us...@open-mpi.org> 
>       Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>       Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/06/27095.php
> 
> 
> <error_output.tar.bz2>_______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/06/27102.php

Reply via email to