Hmmm...that's not a "bug", but just a packaging issue with the way CentOS 
distributed some variants of OMPI that requires you install/update things in a 
specific order.

On Jul 20, 2014, at 11:34 PM, Lane, William <william.l...@cshs.org> wrote:

> Please see:
> 
> http://bugs.centos.org/view.php?id=5812
> 
> From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain 
> [r...@open-mpi.org]
> Sent: Sunday, July 20, 2014 9:30 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Mpirun 1.5.4 problems when request > 28 slots
> 
> I'm unaware of any CentOS-OMPI bug, and I've been using CentOS throughout the 
> 6.x series running OMPI 1.6.x and above.
> 
> I can't speak to the older versions of CentOS and/or the older versions of 
> OMPI.
> 
> On Jul 19, 2014, at 8:14 PM, Lane, William <william.l...@cshs.org> wrote:
> 
>> Yes there is a second HPC Sun Grid Engine cluster on which I've run
>> this openMPI test code dozens of times on upwards of 400 slots
>> through SGE using qsub and qrsh, but this was using a much
>> older version of openMPI (1.3.3 I believe). On that particular cluster the
>> open files hard and soft limits were an issue.
>> 
>> I have noticed that there has been a new (as of July 2014) CentOS openMPI 
>> bug that
>> occurs when CentOS is upgraded from 6.2 to 6.3. I'm not sure if that
>> bug applies to this situation though.
>> 
>> This particular problem occurs whether or not I submit jobs through SGE (via 
>> qrsh
>> or qsub) or outside of SGE which leads me to believe it is an openMPI and/or 
>> CentOS
>> issue.
>> 
>> -Bill Lane
>> 
>> From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain 
>> [r...@open-mpi.org]
>> Sent: Saturday, July 19, 2014 3:21 PM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] Mpirun 1.5.4 problems when request > 28 slots
>> 
>> Not for this test case size. You should be just fine with the default values.
>> 
>> If I understand you correctly, you've run this app at scale before on 
>> another cluster without problem?
>> 
>> On Jul 19, 2014, at 1:34 PM, Lane, William <william.l...@cshs.org> wrote:
>> 
>>> Ralph,
>>> 
>>> It's hard to imagine it's the openMPI code because I've tested this code
>>> extensively on another cluster with 400 nodes and never had any problems.
>>> But I'll try using the hello_c example in any case. Is it still recommended 
>>> to
>>> raise the open files soft and hard limits to 4096? Or should even larger 
>>> values
>>> be necessary?
>>> 
>>> Thank you for your help.
>>> 
>>> -Bill Lane
>>> 
>>> From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain 
>>> [r...@open-mpi.org]
>>> Sent: Saturday, July 19, 2014 8:07 AM
>>> To: Open MPI Users
>>> Subject: Re: [OMPI users] Mpirun 1.5.4 problems when request > 28 slots
>>> 
>>> That's a pretty old OMPI version, and we don't really support it any 
>>> longer. However, I can provide some advice:
>>> 
>>> * have you tried running the simple "hello_c" example we provide? This 
>>> would at least tell you if the problem is in your app, which is what I'd 
>>> expect given your description
>>> 
>>> * try using gdb (or pick your debugger) to look at the corefile and see 
>>> where it is failing
>>> 
>>> I'd also suggest updating OMPI to the 1.6.5 or 1.8.1 versions, but I doubt 
>>> that's the issue behind this problem.
>>> 
>>> 
>>> On Jul 19, 2014, at 1:05 AM, Lane, William <william.l...@cshs.org> wrote:
>>> 
>>>> I'm getting consistent errors of the form:
>>>> 
>>>> "mpirun noticed that process rank 3 with PID 802 on node csclprd3-0-8 
>>>> exited on signal 11 (Segmentation fault)."
>>>> 
>>>> whenever I request more than 28 slots. These
>>>> errors even occur when I run mpirun locally
>>>> on a compute node that has 32 slots (8 cores, 16 with hyperthreading).
>>>> 
>>>> When I run less than 28 slots I have no problems whatsoever.
>>>> 
>>>> OS: 
>>>> CentOS release 6.3 (Final)
>>>> 
>>>> openMPI information:
>>>>                  Package: Open MPI mockbu...@c6b8.bsys.dev.centos.org 
>>>> Distribution
>>>>                 Open MPI: 1.5.4
>>>>    Open MPI SVN revision: r25060
>>>>    Open MPI release date: Aug 18, 2011
>>>>                 Open RTE: 1.5.4
>>>>    Open RTE SVN revision: r25060
>>>>    Open RTE release date: Aug 18, 2011
>>>>                     OPAL: 1.5.4
>>>>        OPAL SVN revision: r25060
>>>>        OPAL release date: Aug 18, 2011
>>>>             Ident string: 1.5.4
>>>>                   Prefix: /usr/lib64/openmpi
>>>>  Configured architecture: x86_64-unknown-linux-gnu
>>>>           Configure host: c6b8.bsys.dev.centos.org
>>>>            Configured by: mockbuild
>>>>            Configured on: Fri Jun 22 06:42:03 UTC 2012
>>>>           Configure host: c6b8.bsys.dev.centos.org
>>>>                 Built by: mockbuild
>>>>                 Built on: Fri Jun 22 06:46:48 UTC 2012
>>>>               Built host: c6b8.bsys.dev.centos.org
>>>>               C bindings: yes
>>>>             C++ bindings: yes
>>>>       Fortran77 bindings: yes (all)
>>>>       Fortran90 bindings: yes
>>>>  Fortran90 bindings size: small
>>>>               C compiler: gcc
>>>>      C compiler absolute: /usr/bin/gcc
>>>>   C compiler family name: GNU
>>>>       C compiler version: 4.4.6
>>>>             C++ compiler: g++
>>>>    C++ compiler absolute: /usr/bin/g++
>>>>       Fortran77 compiler: gfortran
>>>>   Fortran77 compiler abs: /usr/bin/gfortran
>>>>       Fortran90 compiler: gfortran
>>>>   Fortran90 compiler abs: /usr/bin/gfortran
>>>>              C profiling: yes
>>>>            C++ profiling: yes
>>>>      Fortran77 profiling: yes
>>>>      Fortran90 profiling: yes
>>>>           C++ exceptions: no
>>>>           Thread support: posix (MPI_THREAD_MULTIPLE: no, progress: no)
>>>>            Sparse Groups: no
>>>>   Internal debug support: no
>>>>   MPI interface warnings: no
>>>>      MPI parameter check: runtime
>>>> Memory profiling support: no
>>>> Memory debugging support: no
>>>>          libltdl support: yes
>>>>    Heterogeneous support: no
>>>>  mpirun default --prefix: no
>>>>          MPI I/O support: yes
>>>>        MPI_WTIME support: gettimeofday
>>>>      Symbol vis. support: yes
>>>>           MPI extensions: affinity example
>>>>    FT Checkpoint support: no (checkpoint thread: no)
>>>>   MPI_MAX_PROCESSOR_NAME: 256
>>>>     MPI_MAX_ERROR_STRING: 256
>>>>      MPI_MAX_OBJECT_NAME: 64
>>>>         MPI_MAX_INFO_KEY: 36
>>>>         MPI_MAX_INFO_VAL: 256
>>>>        MPI_MAX_PORT_NAME: 1024
>>>>   MPI_MAX_DATAREP_STRING: 128
>>>>            MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.5.4)
>>>>           MCA memchecker: valgrind (MCA v2.0, API v2.0, Component v1.5.4)
>>>>               MCA memory: linux (MCA v2.0, API v2.0, Component v1.5.4)
>>>>            MCA paffinity: hwloc (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                MCA carto: auto_detect (MCA v2.0, API v2.0, Component 
>>>> v1.5.4)
>>>>                MCA carto: file (MCA v2.0, API v2.0, Component v1.5.4)
>>>>            MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.5.4)
>>>>            MCA maffinity: libnuma (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                MCA timer: linux (MCA v2.0, API v2.0, Component v1.5.4)
>>>>          MCA installdirs: env (MCA v2.0, API v2.0, Component v1.5.4)
>>>>          MCA installdirs: config (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA dpm: orte (MCA v2.0, API v2.0, Component v1.5.4)
>>>>               MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.5.4)
>>>>            MCA allocator: basic (MCA v2.0, API v2.0, Component v1.5.4)
>>>>            MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                 MCA coll: basic (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                 MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                 MCA coll: inter (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                 MCA coll: self (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                 MCA coll: sm (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                 MCA coll: sync (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                 MCA coll: tuned (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                MCA mpool: fake (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                MCA mpool: sm (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA pml: bfo (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA pml: csum (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA pml: v (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA bml: r2 (MCA v2.0, API v2.0, Component v1.5.4)
>>>>               MCA rcache: vma (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA btl: ofud (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA btl: openib (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA btl: self (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA btl: sm (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA btl: tcp (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                 MCA topo: unity (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA osc: rdma (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA iof: hnp (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA iof: orted (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA iof: tool (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA oob: tcp (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                 MCA odls: default (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA ras: cm (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA ras: loadleveler (MCA v2.0, API v2.0, Component 
>>>> v1.5.4)
>>>>                  MCA ras: slurm (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                MCA rmaps: load_balance (MCA v2.0, API v2.0, Component 
>>>> v1.5.4)
>>>>                MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                MCA rmaps: resilient (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                MCA rmaps: round_robin (MCA v2.0, API v2.0, Component 
>>>> v1.5.4)
>>>>                MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                MCA rmaps: topo (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA rml: oob (MCA v2.0, API v2.0, Component v1.5.4)
>>>>               MCA routed: binomial (MCA v2.0, API v2.0, Component v1.5.4)
>>>>               MCA routed: cm (MCA v2.0, API v2.0, Component v1.5.4)
>>>>               MCA routed: direct (MCA v2.0, API v2.0, Component v1.5.4)
>>>>               MCA routed: linear (MCA v2.0, API v2.0, Component v1.5.4)
>>>>               MCA routed: radix (MCA v2.0, API v2.0, Component v1.5.4)
>>>>               MCA routed: slave (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA plm: rsh (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA plm: rshd (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA plm: slurm (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                MCA filem: rsh (MCA v2.0, API v2.0, Component v1.5.4)
>>>>               MCA errmgr: default (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA ess: env (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA ess: hnp (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA ess: singleton (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA ess: slave (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA ess: slurm (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA ess: slurmd (MCA v2.0, API v2.0, Component v1.5.4)
>>>>                  MCA ess: tool (MCA v2.0, API v2.0, Component v1.5.4)
>>>>              MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.5.4)
>>>>              MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.5.4)
>>>>              MCA grpcomm: hier (MCA v2.0, API v2.0, Component v1.5.4)
>>>>             MCA notifier: command (MCA v2.0, API v1.0, Component v1.5.4)
>>>>             MCA notifier: smtp (MCA v2.0, API v1.0, Component v1.5.4)
>>>>             MCA notifier: syslog (MCA v2.0, API v1.0, Component v1.5.4)
>>>> 
>>>> IMPORTANT WARNING: This message is intended for the use of the person or 
>>>> entity to which it is addressed and may contain information that is 
>>>> privileged and confidential, the disclosure of which is governed by 
>>>> applicable law. If the reader of this message is not the intended 
>>>> recipient, or the employee or agent responsible for delivering it to the 
>>>> intended recipient, you are hereby notified that any dissemination, 
>>>> distribution or copying of this information is STRICTLY PROHIBITED. If you 
>>>> have received this message in error, please notify us immediately by 
>>>> calling (310) 423-6428 and destroy the related message. Thank You for your 
>>>> cooperation. _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2014/07/24815.php
>>> 
>>> IMPORTANT WARNING: This message is intended for the use of the person or 
>>> entity to which it is addressed and may contain information that is 
>>> privileged and confidential, the disclosure of which is governed by 
>>> applicable law. If the reader of this message is not the intended 
>>> recipient, or the employee or agent responsible for delivering it to the 
>>> intended recipient, you are hereby notified that any dissemination, 
>>> distribution or copying of this information is STRICTLY PROHIBITED. If you 
>>> have received this message in error, please notify us immediately by 
>>> calling (310) 423-6428 and destroy the related message. Thank You for your 
>>> cooperation. _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/07/24817.php
>> 
>> IMPORTANT WARNING: This message is intended for the use of the person or 
>> entity to which it is addressed and may contain information that is 
>> privileged and confidential, the disclosure of which is governed by 
>> applicable law. If the reader of this message is not the intended recipient, 
>> or the employee or agent responsible for delivering it to the intended 
>> recipient, you are hereby notified that any dissemination, distribution or 
>> copying of this information is STRICTLY PROHIBITED. If you have received 
>> this message in error, please notify us immediately by calling (310) 
>> 423-6428 and destroy the related message. Thank You for your cooperation. 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/07/24819.php
> 
> IMPORTANT WARNING: This message is intended for the use of the person or 
> entity to which it is addressed and may contain information that is 
> privileged and confidential, the disclosure of which is governed by 
> applicable law. If the reader of this message is not the intended recipient, 
> or the employee or agent responsible for delivering it to the intended 
> recipient, you are hereby notified that any dissemination, distribution or 
> copying of this information is STRICTLY PROHIBITED. If you have received this 
> message in error, please notify us immediately by calling (310) 423-6428 and 
> destroy the related message. Thank You for your cooperation. 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/07/24832.php

Reply via email to