Not for this test case size. You should be just fine with the default values.

If I understand you correctly, you've run this app at scale before on another 
cluster without problem?

On Jul 19, 2014, at 1:34 PM, Lane, William <william.l...@cshs.org> wrote:

> Ralph,
> 
> It's hard to imagine it's the openMPI code because I've tested this code
> extensively on another cluster with 400 nodes and never had any problems.
> But I'll try using the hello_c example in any case. Is it still recommended to
> raise the open files soft and hard limits to 4096? Or should even larger 
> values
> be necessary?
> 
> Thank you for your help.
> 
> -Bill Lane
> 
> From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain 
> [r...@open-mpi.org]
> Sent: Saturday, July 19, 2014 8:07 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Mpirun 1.5.4 problems when request > 28 slots
> 
> That's a pretty old OMPI version, and we don't really support it any longer. 
> However, I can provide some advice:
> 
> * have you tried running the simple "hello_c" example we provide? This would 
> at least tell you if the problem is in your app, which is what I'd expect 
> given your description
> 
> * try using gdb (or pick your debugger) to look at the corefile and see where 
> it is failing
> 
> I'd also suggest updating OMPI to the 1.6.5 or 1.8.1 versions, but I doubt 
> that's the issue behind this problem.
> 
> 
> On Jul 19, 2014, at 1:05 AM, Lane, William <william.l...@cshs.org> wrote:
> 
>> I'm getting consistent errors of the form:
>> 
>> "mpirun noticed that process rank 3 with PID 802 on node csclprd3-0-8 exited 
>> on signal 11 (Segmentation fault)."
>> 
>> whenever I request more than 28 slots. These
>> errors even occur when I run mpirun locally
>> on a compute node that has 32 slots (8 cores, 16 with hyperthreading).
>> 
>> When I run less than 28 slots I have no problems whatsoever.
>> 
>> OS: 
>> CentOS release 6.3 (Final)
>> 
>> openMPI information:
>>                  Package: Open MPI mockbu...@c6b8.bsys.dev.centos.org 
>> Distribution
>>                 Open MPI: 1.5.4
>>    Open MPI SVN revision: r25060
>>    Open MPI release date: Aug 18, 2011
>>                 Open RTE: 1.5.4
>>    Open RTE SVN revision: r25060
>>    Open RTE release date: Aug 18, 2011
>>                     OPAL: 1.5.4
>>        OPAL SVN revision: r25060
>>        OPAL release date: Aug 18, 2011
>>             Ident string: 1.5.4
>>                   Prefix: /usr/lib64/openmpi
>>  Configured architecture: x86_64-unknown-linux-gnu
>>           Configure host: c6b8.bsys.dev.centos.org
>>            Configured by: mockbuild
>>            Configured on: Fri Jun 22 06:42:03 UTC 2012
>>           Configure host: c6b8.bsys.dev.centos.org
>>                 Built by: mockbuild
>>                 Built on: Fri Jun 22 06:46:48 UTC 2012
>>               Built host: c6b8.bsys.dev.centos.org
>>               C bindings: yes
>>             C++ bindings: yes
>>       Fortran77 bindings: yes (all)
>>       Fortran90 bindings: yes
>>  Fortran90 bindings size: small
>>               C compiler: gcc
>>      C compiler absolute: /usr/bin/gcc
>>   C compiler family name: GNU
>>       C compiler version: 4.4.6
>>             C++ compiler: g++
>>    C++ compiler absolute: /usr/bin/g++
>>       Fortran77 compiler: gfortran
>>   Fortran77 compiler abs: /usr/bin/gfortran
>>       Fortran90 compiler: gfortran
>>   Fortran90 compiler abs: /usr/bin/gfortran
>>              C profiling: yes
>>            C++ profiling: yes
>>      Fortran77 profiling: yes
>>      Fortran90 profiling: yes
>>           C++ exceptions: no
>>           Thread support: posix (MPI_THREAD_MULTIPLE: no, progress: no)
>>            Sparse Groups: no
>>   Internal debug support: no
>>   MPI interface warnings: no
>>      MPI parameter check: runtime
>> Memory profiling support: no
>> Memory debugging support: no
>>          libltdl support: yes
>>    Heterogeneous support: no
>>  mpirun default --prefix: no
>>          MPI I/O support: yes
>>        MPI_WTIME support: gettimeofday
>>      Symbol vis. support: yes
>>           MPI extensions: affinity example
>>    FT Checkpoint support: no (checkpoint thread: no)
>>   MPI_MAX_PROCESSOR_NAME: 256
>>     MPI_MAX_ERROR_STRING: 256
>>      MPI_MAX_OBJECT_NAME: 64
>>         MPI_MAX_INFO_KEY: 36
>>         MPI_MAX_INFO_VAL: 256
>>        MPI_MAX_PORT_NAME: 1024
>>   MPI_MAX_DATAREP_STRING: 128
>>            MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.5.4)
>>           MCA memchecker: valgrind (MCA v2.0, API v2.0, Component v1.5.4)
>>               MCA memory: linux (MCA v2.0, API v2.0, Component v1.5.4)
>>            MCA paffinity: hwloc (MCA v2.0, API v2.0, Component v1.5.4)
>>                MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.5.4)
>>                MCA carto: file (MCA v2.0, API v2.0, Component v1.5.4)
>>            MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.5.4)
>>            MCA maffinity: libnuma (MCA v2.0, API v2.0, Component v1.5.4)
>>                MCA timer: linux (MCA v2.0, API v2.0, Component v1.5.4)
>>          MCA installdirs: env (MCA v2.0, API v2.0, Component v1.5.4)
>>          MCA installdirs: config (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA dpm: orte (MCA v2.0, API v2.0, Component v1.5.4)
>>               MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.5.4)
>>            MCA allocator: basic (MCA v2.0, API v2.0, Component v1.5.4)
>>            MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.5.4)
>>                 MCA coll: basic (MCA v2.0, API v2.0, Component v1.5.4)
>>                 MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.5.4)
>>                 MCA coll: inter (MCA v2.0, API v2.0, Component v1.5.4)
>>                 MCA coll: self (MCA v2.0, API v2.0, Component v1.5.4)
>>                 MCA coll: sm (MCA v2.0, API v2.0, Component v1.5.4)
>>                 MCA coll: sync (MCA v2.0, API v2.0, Component v1.5.4)
>>                 MCA coll: tuned (MCA v2.0, API v2.0, Component v1.5.4)
>>                MCA mpool: fake (MCA v2.0, API v2.0, Component v1.5.4)
>>                MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.5.4)
>>                MCA mpool: sm (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA pml: bfo (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA pml: csum (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA pml: v (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA bml: r2 (MCA v2.0, API v2.0, Component v1.5.4)
>>               MCA rcache: vma (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA btl: ofud (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA btl: openib (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA btl: self (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA btl: sm (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA btl: tcp (MCA v2.0, API v2.0, Component v1.5.4)
>>                 MCA topo: unity (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA osc: rdma (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA iof: hnp (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA iof: orted (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA iof: tool (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA oob: tcp (MCA v2.0, API v2.0, Component v1.5.4)
>>                 MCA odls: default (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA ras: cm (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA ras: loadleveler (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA ras: slurm (MCA v2.0, API v2.0, Component v1.5.4)
>>                MCA rmaps: load_balance (MCA v2.0, API v2.0, Component v1.5.4)
>>                MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.5.4)
>>                MCA rmaps: resilient (MCA v2.0, API v2.0, Component v1.5.4)
>>                MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.5.4)
>>                MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.5.4)
>>                MCA rmaps: topo (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA rml: oob (MCA v2.0, API v2.0, Component v1.5.4)
>>               MCA routed: binomial (MCA v2.0, API v2.0, Component v1.5.4)
>>               MCA routed: cm (MCA v2.0, API v2.0, Component v1.5.4)
>>               MCA routed: direct (MCA v2.0, API v2.0, Component v1.5.4)
>>               MCA routed: linear (MCA v2.0, API v2.0, Component v1.5.4)
>>               MCA routed: radix (MCA v2.0, API v2.0, Component v1.5.4)
>>               MCA routed: slave (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA plm: rsh (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA plm: rshd (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA plm: slurm (MCA v2.0, API v2.0, Component v1.5.4)
>>                MCA filem: rsh (MCA v2.0, API v2.0, Component v1.5.4)
>>               MCA errmgr: default (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA ess: env (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA ess: hnp (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA ess: singleton (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA ess: slave (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA ess: slurm (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA ess: slurmd (MCA v2.0, API v2.0, Component v1.5.4)
>>                  MCA ess: tool (MCA v2.0, API v2.0, Component v1.5.4)
>>              MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.5.4)
>>              MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.5.4)
>>              MCA grpcomm: hier (MCA v2.0, API v2.0, Component v1.5.4)
>>             MCA notifier: command (MCA v2.0, API v1.0, Component v1.5.4)
>>             MCA notifier: smtp (MCA v2.0, API v1.0, Component v1.5.4)
>>             MCA notifier: syslog (MCA v2.0, API v1.0, Component v1.5.4)
>> 
>> IMPORTANT WARNING: This message is intended for the use of the person or 
>> entity to which it is addressed and may contain information that is 
>> privileged and confidential, the disclosure of which is governed by 
>> applicable law. If the reader of this message is not the intended recipient, 
>> or the employee or agent responsible for delivering it to the intended 
>> recipient, you are hereby notified that any dissemination, distribution or 
>> copying of this information is STRICTLY PROHIBITED. If you have received 
>> this message in error, please notify us immediately by calling (310) 
>> 423-6428 and destroy the related message. Thank You for your cooperation. 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/07/24815.php
> 
> IMPORTANT WARNING: This message is intended for the use of the person or 
> entity to which it is addressed and may contain information that is 
> privileged and confidential, the disclosure of which is governed by 
> applicable law. If the reader of this message is not the intended recipient, 
> or the employee or agent responsible for delivering it to the intended 
> recipient, you are hereby notified that any dissemination, distribution or 
> copying of this information is STRICTLY PROHIBITED. If you have received this 
> message in error, please notify us immediately by calling (310) 423-6428 and 
> destroy the related message. Thank You for your cooperation. 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/07/24817.php

Reply via email to