Hmmm…well, that shouldn’t be the issue. To check, try running it with “bind-to 
none”. If you can get a backtrace telling us where it is crashing, that would 
also help.


> On Apr 6, 2015, at 12:24 PM, Lane, William <william.l...@cshs.org> wrote:
> 
> Ralph,
> 
> For the following two different commandline invocations of the LAPACK 
> benchmark
> 
> $MPI_DIR/bin/mpirun -np $NSLOTS --report-bindings --hostfile 
> hostfile-no_slots --mca btl_tcp_if_include eth0 --hetero-nodes 
> --use-hwthread-cpus --bind-to hwthread --prefix $MPI_DIR 
> $BENCH_DIR/$APP_DIR/$APP_BIN
> 
> $MPI_DIR/bin/mpirun -np $NSLOTS --report-bindings --hostfile 
> hostfile-no_slots --mca btl_tcp_if_include eth0 --hetero-nodes --bind-to-core 
> --prefix $MPI_DIR $BENCH_DIR/$APP_DIR/$APP_BIN
> 
> I'm receiving the same kinds of OpenMPI error messages (but for different 
> nodes in the ring):
> 
>         [csclprd3-0-16:25940] *** Process received signal ***
>         [csclprd3-0-16:25940] Signal: Bus error (7)
>         [csclprd3-0-16:25940] Signal code: Non-existant physical address (2)
>         [csclprd3-0-16:25940] Failing at address: 0x7f8b1b5a2600
> 
>         
> --------------------------------------------------------------------------
>         mpirun noticed that process rank 82 with PID 25936 on node 
> csclprd3-0-16 exited on signal 7 (Bus error).
>         
> --------------------------------------------------------------------------
>         16 total processes killed (some possibly by mpirun during cleanup)
> 
> It seems to occur on systems that have more than one, physical CPU installed. 
> Could
> this be due to a lack of the correct NUMA libraries being installed?
> 
> -Bill L.
> 
> From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain 
> [r...@open-mpi.org]
> Sent: Sunday, April 05, 2015 6:09 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3
> 
> 
>> On Apr 5, 2015, at 5:58 PM, Lane, William <william.l...@cshs.org 
>> <mailto:william.l...@cshs.org>> wrote:
>> 
>> I think some of the Intel Blade systems in the cluster are
>> dual core, but don't support hyperthreading. Maybe it
>> would be better to exclude hyperthreading altogether
>> from submitted OpenMPI jobs?
> 
> Yes - or you can add "--hetero-nodes -use-hwthread-cpus --bind-to hwthread" 
> to the cmd line. This tells mpirun that the nodes aren't all the same, and so 
> it has to look at each node's topology instead of taking the first node as 
> the template for everything. The second tells it to use the HTs as 
> independent cpus where they are supported.
> 
> I'm not entirely sure the suggestion will work - if we hit a place where HT 
> isn't supported, we may balk at being asked to bind to HTs. I can probably 
> make a change that supports this kind of hetero arrangement (perhaps 
> something like bind-to pu) - might make it into 1.8.5 (we are just starting 
> the release process on it now).
> 
>> 
>> OpenMPI doesn't crash, but it doesn't run the LAPACK
>> benchmark either.
>> 
>> Thanks again Ralph.
>> 
>> Bill L.
>> 
>> From: users [users-boun...@open-mpi.org <mailto:users-boun...@open-mpi.org>] 
>> on behalf of Ralph Castain [r...@open-mpi.org <mailto:r...@open-mpi.org>]
>> Sent: Wednesday, April 01, 2015 8:40 AM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3
>> 
>> Bingo - you said the magic word. This is a terminology issue. When we say 
>> "core", we mean the old definition of "core", not "hyperthreads". If you 
>> want to use HTs as your base processing unit and bind to them, then you need 
>> to specify --bind-to hwthread. That warning should then go away.
>> 
>> We don't require a swap region be mounted - I didn't see anything in your 
>> original message indicating that OMPI had actually crashed, but just wasn't 
>> launching due to the above issue. Were you actually seeing crashes as well?
>> 
>> 
>> On Wed, Apr 1, 2015 at 8:31 AM, Lane, William <william.l...@cshs.org 
>> <mailto:william.l...@cshs.org>> wrote:
>> Ralph,
>> 
>> Here's the associated hostfile:
>> 
>> #openMPI hostfile for csclprd3
>> #max slots prevents oversubscribing csclprd3-0-9
>> csclprd3-0-0 slots=12 max-slots=12
>> csclprd3-0-1 slots=6 max-slots=6
>> csclprd3-0-2 slots=6 max-slots=6
>> csclprd3-0-3 slots=6 max-slots=6
>> csclprd3-0-4 slots=6 max-slots=6
>> csclprd3-0-5 slots=6 max-slots=6
>> csclprd3-0-6 slots=6 max-slots=6
>> csclprd3-0-7 slots=32 max-slots=32
>> csclprd3-0-8 slots=32 max-slots=32
>> csclprd3-0-9 slots=32 max-slots=32
>> csclprd3-0-10 slots=32 max-slots=32
>> csclprd3-0-11 slots=32 max-slots=32
>> csclprd3-0-12 slots=12 max-slots=12
>> csclprd3-0-13 slots=24 max-slots=24
>> csclprd3-0-14 slots=16 max-slots=16
>> csclprd3-0-15 slots=16 max-slots=16
>> csclprd3-0-16 slots=24 max-slots=24
>> csclprd3-0-17 slots=24 max-slots=24
>> csclprd3-6-1 slots=4 max-slots=4
>> csclprd3-6-5 slots=4 max-slots=4
>> 
>> The number of slots also includes hyperthreading
>> cores.
>> 
>> One more question, would not having defined swap
>> partitions on all the nodes in the ring cause OpenMPI
>> to crash? Because no swap partitions are defined
>> for any of the above systems.
>> 
>> -Bill L.
>> 
>> 
>> From: users [users-boun...@open-mpi.org <mailto:users-boun...@open-mpi.org>] 
>> on behalf of Ralph Castain [r...@open-mpi.org <mailto:r...@open-mpi.org>]
>> Sent: Wednesday, April 01, 2015 5:04 AM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3
>> 
>> The warning about binding to memory is due to not having numactl-devel 
>> installed on the system. The job would still run, but we are warning you 
>> that we cannot bind memory to the same domain as the core where we bind the 
>> process. Can cause poor performance, but not fatal. I forget the name of the 
>> param, but you can tell us to "shut up" :-)
>> 
>> The other warning/error indicates that we aren't seeing enough cores on the 
>> allocation you gave us via the hostile to support one proc/core - i.e., we 
>> didn't at least 128 cores in the sum of the nodes you told us about. I take 
>> it you were expecting that there were that many or more?
>> 
>> Ralph
>> 
>> 
>> On Wed, Apr 1, 2015 at 12:54 AM, Lane, William <william.l...@cshs.org 
>> <mailto:william.l...@cshs.org>> wrote:
>> I'm having problems running OpenMPI jobs
>> (using a hostfile) on an HPC cluster running
>> ROCKS on CentOS 6.3. I'm running OpenMPI
>> outside of Sun Grid Engine (i.e. it is not submitted
>> as a job to SGE). The program being run is a LAPACK
>> benchmark. The commandline parameter I'm 
>> using to run the jobs is:
>> 
>> $MPI_DIR/bin/mpirun -np $NSLOTS -bind-to-core -report-bindings --hostfile 
>> hostfile --mca btl_tcp_if_include eth0 --prefix $MPI_DIR 
>> $BENCH_DIR/$APP_DIR/$APP_BIN
>> 
>> Where MPI_DIR=/hpc/apps/mpi/openmpi/1.8.2/
>> NSLOTS=128
>> 
>> I'm getting errors of the form and OpenMPI never runs the LAPACK benchmark:
>> 
>>    --------------------------------------------------------------------------
>>    WARNING: a request was made to bind a process. While the system
>>    supports binding the process itself, at least one node does NOT
>>    support binding memory to the process location.
>> 
>>     Node:  csclprd3-0-11
>> 
>>    This usually is due to not having the required NUMA support installed
>>    on the node. In some Linux distributions, the required support is
>>    contained in the libnumactl and libnumactl-devel packages.
>>    This is a warning only; your job will continue, though performance may be 
>> degraded.
>>    --------------------------------------------------------------------------
>> 
>>    --------------------------------------------------------------------------
>>    A request was made to bind to that would result in binding more
>>    processes than cpus on a resource:
>> 
>>       Bind to:     CORE
>>       Node:        csclprd3-0-11
>>       #processes:  2
>>       #cpus:       1
>> 
>>    You can override this protection by adding the "overload-allowed"
>>    option to your binding directive.
>>    --------------------------------------------------------------------------
>> 
>> The only installed numa packages are:
>> numactl.x86_64                                                2.0.7-3.el6    
>>                     @centos6.3-x86_64-0/$
>> 
>> When I search for the available NUMA packages I find:
>> 
>> yum search numa | less
>> 
>>         Loaded plugins: fastestmirror
>>         Loading mirror speeds from cached hostfile
>>         ============================== N/S Matched: numa 
>> ===============================
>>         numactl-devel.i686 : Development package for building Applications 
>> that use numa
>>         numactl-devel.x86_64 : Development package for building Applications 
>> that use
>>                              : numa
>>         numad.x86_64 : NUMA user daemon
>>         numactl.i686 : Library for tuning for Non Uniform Memory Access 
>> machines
>>         numactl.x86_64 : Library for tuning for Non Uniform Memory Access 
>> machines
>> 
>> Do I need to install additional and/or different NUMA packages in order to 
>> get OpenMPI to work
>> on this cluster?
>> 
>> -Bill Lane
>> IMPORTANT WARNING: This message is intended for the use of the person or 
>> entity to which it is addressed and may contain information that is 
>> privileged and confidential, the disclosure of which is governed by 
>> applicable law. If the reader of this message is not the intended recipient, 
>> or the employee or agent responsible for delivering it to the intended 
>> recipient, you are hereby notified that any dissemination, distribution or 
>> copying of this information is strictly prohibited. Thank you for your 
>> cooperation.
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Searchable archives: 
>> http://www.open-mpi.org/community/lists/users/2015/04/index.php 
>> <http://www.open-mpi.org/community/lists/users/2015/04/index.php>
>> 
>> IMPORTANT WARNING: This message is intended for the use of the person or 
>> entity to which it is addressed and may contain information that is 
>> privileged and confidential, the disclosure of which is governed by 
>> applicable law. If the reader of this message is not the intended recipient, 
>> or the employee or agent responsible for delivering it to the intended 
>> recipient, you are hereby notified that any dissemination, distribution or 
>> copying of this information is strictly prohibited. Thank you for your 
>> cooperation. 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/04/26589.php 
>> <http://www.open-mpi.org/community/lists/users/2015/04/26589.php>
>> 
>> IMPORTANT WARNING: This message is intended for the use of the person or 
>> entity to which it is addressed and may contain information that is 
>> privileged and confidential, the disclosure of which is governed by 
>> applicable law. If the reader of this message is not the intended recipient, 
>> or the employee or agent responsible for delivering it to the intended 
>> recipient, you are hereby notified that any dissemination, distribution or 
>> copying of this information is strictly prohibited. Thank you for your 
>> cooperation. _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/04/26611.php 
>> <http://www.open-mpi.org/community/lists/users/2015/04/26611.php>
> IMPORTANT WARNING: This message is intended for the use of the person or 
> entity to which it is addressed and may contain information that is 
> privileged and confidential, the disclosure of which is governed by 
> applicable law. If the reader of this message is not the intended recipient, 
> or the employee or agent responsible for delivering it to the intended 
> recipient, you are hereby notified that any dissemination, distribution or 
> copying of this information is strictly prohibited. Thank you for your 
> cooperation. _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/04/26618.php 
> <http://www.open-mpi.org/community/lists/users/2015/04/26618.php>

Reply via email to