I’m not sure our man pages are good enough to answer your question, but here is 
the URL

http://www.open-mpi.org/doc/v1.8/ <http://www.open-mpi.org/doc/v1.8/>

I’m a tad tied up right now, but I’ll try to address this prior to 1.8.5 
release. Thanks for all that debug effort! Helps a bunch.

> On Apr 7, 2015, at 1:17 PM, Lane, William <william.l...@cshs.org> wrote:
> 
> Ralph,
> 
> I've finally had some luck using the following:
> $MPI_DIR/bin/mpirun -np $NSLOTS --report-bindings --hostfile hostfile-single 
> --mca btl_tcp_if_include eth0 --hetero-nodes --use-hwthread-cpus --prefix 
> $MPI_DIR $BENCH_DIR/$APP_DIR/$APP_BIN
> 
> Where $NSLOTS was 56 and my hostfile hostfile-single is:
> 
> csclprd3-0-0 slots=12 max-slots=24
> csclprd3-0-1 slots=6 max-slots=12
> csclprd3-0-2 slots=6 max-slots=12
> csclprd3-0-3 slots=6 max-slots=12
> csclprd3-0-4 slots=6 max-slots=12
> csclprd3-0-5 slots=6 max-slots=12
> csclprd3-0-6 slots=6 max-slots=12
> csclprd3-6-1 slots=4 max-slots=4
> csclprd3-6-5 slots=4 max-slots=4
> 
> The max-slots differs from slots on some nodes
> because I include the hyperthreaded cores in
> the max-slots, the last two nodes have CPU's that
> don't support hyperthreading at all.
> 
> Does --use-hwthread-cpus prevent slots from
> being assigned to hyperthreading cores?
> 
> For some reason the manpage for OpenMPI 1.8.2
> isn't installed on our CentOS 6.3 systems is there a
> URL I can I find a copy of the manpages for OpenMPI 1.8.2?
> 
> Thanks for your help,
> 
> -Bill Lane
> 
> From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain 
> [r...@open-mpi.org]
> Sent: Monday, April 06, 2015 1:39 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3
> 
> Hmmm…well, that shouldn’t be the issue. To check, try running it with 
> “bind-to none”. If you can get a backtrace telling us where it is crashing, 
> that would also help.
> 
> 
>> On Apr 6, 2015, at 12:24 PM, Lane, William <william.l...@cshs.org 
>> <mailto:william.l...@cshs.org>> wrote:
>> 
>> Ralph,
>> 
>> For the following two different commandline invocations of the LAPACK 
>> benchmark
>> 
>> $MPI_DIR/bin/mpirun -np $NSLOTS --report-bindings --hostfile 
>> hostfile-no_slots --mca btl_tcp_if_include eth0 --hetero-nodes 
>> --use-hwthread-cpus --bind-to hwthread --prefix $MPI_DIR 
>> $BENCH_DIR/$APP_DIR/$APP_BIN
>> 
>> $MPI_DIR/bin/mpirun -np $NSLOTS --report-bindings --hostfile 
>> hostfile-no_slots --mca btl_tcp_if_include eth0 --hetero-nodes 
>> --bind-to-core --prefix $MPI_DIR $BENCH_DIR/$APP_DIR/$APP_BIN
>> 
>> I'm receiving the same kinds of OpenMPI error messages (but for different 
>> nodes in the ring):
>> 
>>         [csclprd3-0-16:25940] *** Process received signal ***
>>         [csclprd3-0-16:25940] Signal: Bus error (7)
>>         [csclprd3-0-16:25940] Signal code: Non-existant physical address (2)
>>         [csclprd3-0-16:25940] Failing at address: 0x7f8b1b5a2600
>> 
>>         
>> --------------------------------------------------------------------------
>>         mpirun noticed that process rank 82 with PID 25936 on node 
>> csclprd3-0-16 exited on signal 7 (Bus error).
>>         
>> --------------------------------------------------------------------------
>>         16 total processes killed (some possibly by mpirun during cleanup)
>> 
>> It seems to occur on systems that have more than one, physical CPU 
>> installed. Could
>> this be due to a lack of the correct NUMA libraries being installed?
>> 
>> -Bill L.
>> 
>> From: users [users-boun...@open-mpi.org <mailto:users-boun...@open-mpi.org>] 
>> on behalf of Ralph Castain [r...@open-mpi.org <mailto:r...@open-mpi.org>]
>> Sent: Sunday, April 05, 2015 6:09 PM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3
>> 
>> 
>>> On Apr 5, 2015, at 5:58 PM, Lane, William <william.l...@cshs.org 
>>> <mailto:william.l...@cshs.org>> wrote:
>>> 
>>> I think some of the Intel Blade systems in the cluster are
>>> dual core, but don't support hyperthreading. Maybe it
>>> would be better to exclude hyperthreading altogether
>>> from submitted OpenMPI jobs?
>> 
>> Yes - or you can add "--hetero-nodes -use-hwthread-cpus --bind-to hwthread" 
>> to the cmd line. This tells mpirun that the nodes aren't all the same, and 
>> so it has to look at each node's topology instead of taking the first node 
>> as the template for everything. The second tells it to use the HTs as 
>> independent cpus where they are supported.
>> 
>> I'm not entirely sure the suggestion will work - if we hit a place where HT 
>> isn't supported, we may balk at being asked to bind to HTs. I can probably 
>> make a change that supports this kind of hetero arrangement (perhaps 
>> something like bind-to pu) - might make it into 1.8.5 (we are just starting 
>> the release process on it now).
>> 
>>> 
>>> OpenMPI doesn't crash, but it doesn't run the LAPACK
>>> benchmark either.
>>> 
>>> Thanks again Ralph.
>>> 
>>> Bill L.
>>> 
>>> From: users [users-boun...@open-mpi.org 
>>> <mailto:users-boun...@open-mpi.org>] on behalf of Ralph Castain 
>>> [r...@open-mpi.org <mailto:r...@open-mpi.org>]
>>> Sent: Wednesday, April 01, 2015 8:40 AM
>>> To: Open MPI Users
>>> Subject: Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3
>>> 
>>> Bingo - you said the magic word. This is a terminology issue. When we say 
>>> "core", we mean the old definition of "core", not "hyperthreads". If you 
>>> want to use HTs as your base processing unit and bind to them, then you 
>>> need to specify --bind-to hwthread. That warning should then go away.
>>> 
>>> We don't require a swap region be mounted - I didn't see anything in your 
>>> original message indicating that OMPI had actually crashed, but just wasn't 
>>> launching due to the above issue. Were you actually seeing crashes as well?
>>> 
>>> 
>>> On Wed, Apr 1, 2015 at 8:31 AM, Lane, William <william.l...@cshs.org 
>>> <mailto:william.l...@cshs.org>> wrote:
>>> Ralph,
>>> 
>>> Here's the associated hostfile:
>>> 
>>> #openMPI hostfile for csclprd3
>>> #max slots prevents oversubscribing csclprd3-0-9
>>> csclprd3-0-0 slots=12 max-slots=12
>>> csclprd3-0-1 slots=6 max-slots=6
>>> csclprd3-0-2 slots=6 max-slots=6
>>> csclprd3-0-3 slots=6 max-slots=6
>>> csclprd3-0-4 slots=6 max-slots=6
>>> csclprd3-0-5 slots=6 max-slots=6
>>> csclprd3-0-6 slots=6 max-slots=6
>>> csclprd3-0-7 slots=32 max-slots=32
>>> csclprd3-0-8 slots=32 max-slots=32
>>> csclprd3-0-9 slots=32 max-slots=32
>>> csclprd3-0-10 slots=32 max-slots=32
>>> csclprd3-0-11 slots=32 max-slots=32
>>> csclprd3-0-12 slots=12 max-slots=12
>>> csclprd3-0-13 slots=24 max-slots=24
>>> csclprd3-0-14 slots=16 max-slots=16
>>> csclprd3-0-15 slots=16 max-slots=16
>>> csclprd3-0-16 slots=24 max-slots=24
>>> csclprd3-0-17 slots=24 max-slots=24
>>> csclprd3-6-1 slots=4 max-slots=4
>>> csclprd3-6-5 slots=4 max-slots=4
>>> 
>>> The number of slots also includes hyperthreading
>>> cores.
>>> 
>>> One more question, would not having defined swap
>>> partitions on all the nodes in the ring cause OpenMPI
>>> to crash? Because no swap partitions are defined
>>> for any of the above systems.
>>> 
>>> -Bill L.
>>> 
>>> 
>>> From: users [users-boun...@open-mpi.org 
>>> <mailto:users-boun...@open-mpi.org>] on behalf of Ralph Castain 
>>> [r...@open-mpi.org <mailto:r...@open-mpi.org>]
>>> Sent: Wednesday, April 01, 2015 5:04 AM
>>> To: Open MPI Users
>>> Subject: Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3
>>> 
>>> The warning about binding to memory is due to not having numactl-devel 
>>> installed on the system. The job would still run, but we are warning you 
>>> that we cannot bind memory to the same domain as the core where we bind the 
>>> process. Can cause poor performance, but not fatal. I forget the name of 
>>> the param, but you can tell us to "shut up" :-)
>>> 
>>> The other warning/error indicates that we aren't seeing enough cores on the 
>>> allocation you gave us via the hostile to support one proc/core - i.e., we 
>>> didn't at least 128 cores in the sum of the nodes you told us about. I take 
>>> it you were expecting that there were that many or more?
>>> 
>>> Ralph
>>> 
>>> 
>>> On Wed, Apr 1, 2015 at 12:54 AM, Lane, William <william.l...@cshs.org 
>>> <mailto:william.l...@cshs.org>> wrote:
>>> I'm having problems running OpenMPI jobs
>>> (using a hostfile) on an HPC cluster running
>>> ROCKS on CentOS 6.3. I'm running OpenMPI
>>> outside of Sun Grid Engine (i.e. it is not submitted
>>> as a job to SGE). The program being run is a LAPACK
>>> benchmark. The commandline parameter I'm 
>>> using to run the jobs is:
>>> 
>>> $MPI_DIR/bin/mpirun -np $NSLOTS -bind-to-core -report-bindings --hostfile 
>>> hostfile --mca btl_tcp_if_include eth0 --prefix $MPI_DIR 
>>> $BENCH_DIR/$APP_DIR/$APP_BIN
>>> 
>>> Where MPI_DIR=/hpc/apps/mpi/openmpi/1.8.2/
>>> NSLOTS=128
>>> 
>>> I'm getting errors of the form and OpenMPI never runs the LAPACK benchmark:
>>> 
>>>    
>>> --------------------------------------------------------------------------
>>>    WARNING: a request was made to bind a process. While the system
>>>    supports binding the process itself, at least one node does NOT
>>>    support binding memory to the process location.
>>> 
>>>     Node:  csclprd3-0-11
>>> 
>>>    This usually is due to not having the required NUMA support installed
>>>    on the node. In some Linux distributions, the required support is
>>>    contained in the libnumactl and libnumactl-devel packages.
>>>    This is a warning only; your job will continue, though performance may 
>>> be degraded.
>>>    
>>> --------------------------------------------------------------------------
>>> 
>>>    
>>> --------------------------------------------------------------------------
>>>    A request was made to bind to that would result in binding more
>>>    processes than cpus on a resource:
>>> 
>>>       Bind to:     CORE
>>>       Node:        csclprd3-0-11
>>>       #processes:  2
>>>       #cpus:       1
>>> 
>>>    You can override this protection by adding the "overload-allowed"
>>>    option to your binding directive.
>>>    
>>> --------------------------------------------------------------------------
>>> 
>>> The only installed numa packages are:
>>> numactl.x86_64                                                2.0.7-3.el6   
>>>                      @centos6.3-x86_64-0/$
>>> 
>>> When I search for the available NUMA packages I find:
>>> 
>>> yum search numa | less
>>> 
>>>         Loaded plugins: fastestmirror
>>>         Loading mirror speeds from cached hostfile
>>>         ============================== N/S Matched: numa 
>>> ===============================
>>>         numactl-devel.i686 : Development package for building Applications 
>>> that use numa
>>>         numactl-devel.x86_64 : Development package for building 
>>> Applications that use
>>>                              : numa
>>>         numad.x86_64 : NUMA user daemon
>>>         numactl.i686 : Library for tuning for Non Uniform Memory Access 
>>> machines
>>>         numactl.x86_64 : Library for tuning for Non Uniform Memory Access 
>>> machines
>>> 
>>> Do I need to install additional and/or different NUMA packages in order to 
>>> get OpenMPI to work
>>> on this cluster?
>>> 
>>> -Bill Lane
>>> IMPORTANT WARNING: This message is intended for the use of the person or 
>>> entity to which it is addressed and may contain information that is 
>>> privileged and confidential, the disclosure of which is governed by 
>>> applicable law. If the reader of this message is not the intended 
>>> recipient, or the employee or agent responsible for delivering it to the 
>>> intended recipient, you are hereby notified that any dissemination, 
>>> distribution or copying of this information is strictly prohibited. Thank 
>>> you for your cooperation.
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>> Searchable archives: 
>>> http://www.open-mpi.org/community/lists/users/2015/04/index.php 
>>> <http://www.open-mpi.org/community/lists/users/2015/04/index.php>
>>> 
>>> IMPORTANT WARNING: This message is intended for the use of the person or 
>>> entity to which it is addressed and may contain information that is 
>>> privileged and confidential, the disclosure of which is governed by 
>>> applicable law. If the reader of this message is not the intended 
>>> recipient, or the employee or agent responsible for delivering it to the 
>>> intended recipient, you are hereby notified that any dissemination, 
>>> distribution or copying of this information is strictly prohibited. Thank 
>>> you for your cooperation. 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2015/04/26589.php 
>>> <http://www.open-mpi.org/community/lists/users/2015/04/26589.php>
>>> 
>>> IMPORTANT WARNING: This message is intended for the use of the person or 
>>> entity to which it is addressed and may contain information that is 
>>> privileged and confidential, the disclosure of which is governed by 
>>> applicable law. If the reader of this message is not the intended 
>>> recipient, or the employee or agent responsible for delivering it to the 
>>> intended recipient, you are hereby notified that any dissemination, 
>>> distribution or copying of this information is strictly prohibited. Thank 
>>> you for your cooperation. _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2015/04/26611.php 
>>> <http://www.open-mpi.org/community/lists/users/2015/04/26611.php>
>> IMPORTANT WARNING: This message is intended for the use of the person or 
>> entity to which it is addressed and may contain information that is 
>> privileged and confidential, the disclosure of which is governed by 
>> applicable law. If the reader of this message is not the intended recipient, 
>> or the employee or agent responsible for delivering it to the intended 
>> recipient, you are hereby notified that any dissemination, distribution or 
>> copying of this information is strictly prohibited. Thank you for your 
>> cooperation. _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/04/26618.php 
>> <http://www.open-mpi.org/community/lists/users/2015/04/26618.php>
> IMPORTANT WARNING: This message is intended for the use of the person or 
> entity to which it is addressed and may contain information that is 
> privileged and confidential, the disclosure of which is governed by 
> applicable law. If the reader of this message is not the intended recipient, 
> or the employee or agent responsible for delivering it to the intended 
> recipient, you are hereby notified that any dissemination, distribution or 
> copying of this information is strictly prohibited. Thank you for your 
> cooperation. _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/04/26643.php 
> <http://www.open-mpi.org/community/lists/users/2015/04/26643.php>

Reply via email to