Re: [OMPI users] 1.8.5, mxm, and a spurious '-L' flag

2015-05-23 Thread Mike Dubman
Hi,

How mxm was installed? by copying?

The rpm based installation places mxm into /opt/mellanox/mxm and not into
/usr/lib64/libmxm.so.

Do you use HPCx (pack of OMPI and MXM and FCA)?
You can download HPCX, extract it anywhere and compile OMPI pointing to mxm
location under HPCX.

Also, HPCx contains rpms for mxm and fca.


M

On Sat, May 23, 2015 at 1:07 AM, David Shrader  wrote:

> Hello,
>
> I'm getting a spurious '-L' flag when I have mxm installed in system-space
> (/usr/lib64/libmxm.so) which is causing an error at link time during make:
>
> ...output snipped...
> /bin/sh ../../../../libtool  --tag=CC   --mode=link gcc -std=gnu99 -O3
> -DNDEBUG -I/opt/panfs/include -finline-functions -fno-strict-aliasing
> -pthread -module -avoid-version   -o libmca_mtl_mxm.la  mtl_mxm.lo
> mtl_mxm_cancel.lo mtl_mxm_component.lo mtl_mxm_endpoint.lo mtl_mxm_probe.lo
> mtl_mxm_recv.lo mtl_mxm_send.lo -lmxm -L -lrt -lm -lutil
> libtool: link: require no space between `-L' and `-lrt'
> make[2]: *** [libmca_mtl_mxm.la] Error 1
> make[2]: Leaving directory
> `/turquoise/usr/projects/hpctools/dshrader/hpcsoft/openmpi/1.8.5/openmpi-1.8.5/ompi/mca/mtl/mxm'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory
> `/turquoise/usr/projects/hpctools/dshrader/hpcsoft/openmpi/1.8.5/openmpi-1.8.5/ompi'
> make: *** [all-recursive] Error 1
>
> If I I use --with-mxm=no, then this error doesn't occur (as expected as
> the mxm component isn't touched). Has anyone run in to this before?
>
> Here is my configure line:
>
> ./configure --disable-silent-rules
> --with-platform=contrib/platform/lanl/toss/optimized-panasas --prefix=...
>
> I wonder if there is an empty variable that should contain the directory
> libmxm is in somewhere in configure since no directory is passed to
> --with-mxm which is then paired with a "-L". I think I'll go through the
> configure script while waiting to see if anyone else has run in to this.
>
> Thank you for any and all help,
> David
>
> --
> David Shrader
> HPC-3 High Performance Computer Systems
> Los Alamos National Lab
> Email: dshrader  lanl.gov
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/05/26904.php
>



-- 

Kind Regards,

M.


[OMPI users] Problems running linpack benchmark on old Sunfire opteron nodes

2015-05-23 Thread Lane, William
I've compiled the linpack benchmark using openMPI 1.8.5 libraries
and include files on CentOS 6.4.

I've tested the binary on the one Intel node (some
sort of 4-core Xeon) and it runs, but when I try to run it on any of
the old Sunfire opteron compute nodes it appears to hang (although
top indicates CPU and memory usage) and eventually terminates
by itself. I'm also getting the following openMPI error messages/warnings:

mpirun -np 16 --report-bindings --hostfile hostfile --prefix 
/hpc/apps/mpi/openmpi/1.8.5-dev --mca btl_tcp_if_include eth0 xhpl

[cscld1-0-6:24370] create_and_attach: unable to create shared memory BTL 
coordinating structure :: size 134217728
[cscld1-0-3:24734] create_and_attach: unable to create shared memory BTL 
coordinating structure :: size 134217728
[cscld1-0-7:25152] create_and_attach: unable to create shared memory BTL 
coordinating structure :: size 134217728
[cscld1-0-4:18079] create_and_attach: unable to create shared memory BTL 
coordinating structure :: size 134217728
[cscld1-0-8:21443] create_and_attach: unable to create shared memory BTL 
coordinating structure :: size 134217728
[cscld1-0-2:19704] create_and_attach: unable to create shared memory BTL 
coordinating structure :: size 134217728
[cscld1-0-5:13481] create_and_attach: unable to create shared memory BTL 
coordinating structure :: size 134217728
[cscld1-0-0:21884] create_and_attach: unable to create shared memory BTL 
coordinating structure :: size 134217728
[cscld1:24240] 7 more processes have sent help message help-opal-shmem-mmap.txt 
/ target full

Note these errors also occur when I try to run the linpack benchmark on a single
node as well.

Does anyone know what's going on here? Google came up w/nothing and I have no
idea what a BTL coordinating structure is.

-Bill L.

IMPORTANT WARNING: This message is intended for the use of the person or entity 
to which it is addressed and may contain information that is privileged and 
confidential, the disclosure of which is governed by applicable law. If the 
reader of this message is not the intended recipient, or the employee or agent 
responsible for delivering it to the intended recipient, you are hereby 
notified that any dissemination, distribution or copying of this information is 
strictly prohibited. Thank you for your cooperation.


Re: [OMPI users] Problems running linpack benchmark on old Sunfire opteron nodes

2015-05-23 Thread Gilles Gouaillardet
Bill,

the root cause is likely there is not enough free space in /tmp.

the simplest, but slowest, option is to run mpirun --mac btl tcp ...
if you cannot make enough space under /tmp (maybe you run diskless)
there are some options to create these kind of files under /dev/shm

Cheers,

Gilles


On Saturday, May 23, 2015, Lane, William  wrote:

>  I've compiled the linpack benchmark using openMPI 1.8.5 libraries
> and include files on CentOS 6.4.
>
> I've tested the binary on the one Intel node (some
> sort of 4-core Xeon) and it runs, but when I try to run it on any of
> the old Sunfire opteron compute nodes it appears to hang (although
> top indicates CPU and memory usage) and eventually terminates
> by itself. I'm also getting the following openMPI error messages/warnings:
>
> mpirun -np 16 --report-bindings --hostfile hostfile --prefix
> /hpc/apps/mpi/openmpi/1.8.5-dev --mca btl_tcp_if_include eth0 xhpl
>
> [cscld1-0-6:24370] create_and_attach: unable to create shared memory BTL
> coordinating structure :: size 134217728
> [cscld1-0-3:24734] create_and_attach: unable to create shared memory BTL
> coordinating structure :: size 134217728
> [cscld1-0-7:25152] create_and_attach: unable to create shared memory BTL
> coordinating structure :: size 134217728
> [cscld1-0-4:18079] create_and_attach: unable to create shared memory BTL
> coordinating structure :: size 134217728
> [cscld1-0-8:21443] create_and_attach: unable to create shared memory BTL
> coordinating structure :: size 134217728
> [cscld1-0-2:19704] create_and_attach: unable to create shared memory BTL
> coordinating structure :: size 134217728
> [cscld1-0-5:13481] create_and_attach: unable to create shared memory BTL
> coordinating structure :: size 134217728
> [cscld1-0-0:21884] create_and_attach: unable to create shared memory BTL
> coordinating structure :: size 134217728
> [cscld1:24240] 7 more processes have sent help message
> help-opal-shmem-mmap.txt / target full
>
> Note these errors also occur when I try to run the linpack benchmark on a
> single
> node as well.
>
> Does anyone know what's going on here? Google came up w/nothing and I have
> no
> idea what a BTL coordinating structure is.
>
> -Bill L.
>
>  IMPORTANT WARNING: This message is intended for the use of the person or
> entity to which it is addressed and may contain information that is
> privileged and confidential, the disclosure of which is governed by
> applicable law. If the reader of this message is not the intended
> recipient, or the employee or agent responsible for delivering it to the
> intended recipient, you are hereby notified that any dissemination,
> distribution or copying of this information is strictly prohibited. Thank
> you for your cooperation.
>