Thanks, that actually solved one of the errors:
->mpirun -n 1 hello
--------------------------------------------------------------------------
WARNING: It appears that your OpenFabrics subsystem is configured
to only
allow registering part of your physical memory. This can cause
MPI jobs to
run with erratic performance, hang, and/or crash.
This may be caused by your OpenFabrics vendor limiting the amount
of
physical memory that can be registered. You should investigate
the
relevant Linux kernel module parameters that control how much
physical
memory can be registered, and increase them to allow registering
all
physical memory on your machine.
See this Open MPI FAQ item for more information on these Linux
kernel module
parameters:
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
Local host: n01
Registerable memory: 32768 MiB
Total memory: 65503 MiB
Your MPI job will continue, but may be behave poorly and/or hang.
--------------------------------------------------------------------------
Process 0 on n01 out of 1
BTW, the node has 64GB total ram. Is it possible openmpi is limited
to only 32GB? or possibly the ofed installation has such a limit?
On 23/06//2013 17:58, Ralph Castain
wrote:
Don't include udapl - that code may well be stale
Sent from my iPhone
Hi,
I've encountered strange issues when trying to run a simple
mpi job on a single host which has IB.
The complete errors:
-> mpirun -n 1 hello
--------------------------------------------------------------------------
WARNING: Failed to open "ofa-v2-mlx4_0-1"
[DAT_PROVIDER_NOT_FOUND:DAT_NAME_NOT_REGISTERED].
This may be a real error or it may be an invalid entry in
the uDAPL
Registry which is contained in the dat.conf file. Contact
your local
System Administrator to confirm the availability of the
interfaces in
the dat.conf file.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[[53031,1],0]: A high-performance Open MPI point-to-point
messaging module
was unable to find any relevant network interfaces:
Module: uDAPL
Host: n01
Another transport will be used instead, although this may
result in
lower performance.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: It appears that your OpenFabrics subsystem is
configured to only
allow registering part of your physical memory. This can
cause MPI jobs to
run with erratic performance, hang, and/or crash.
This may be caused by your OpenFabrics vendor limiting the
amount of
physical memory that can be registered. You should
investigate the
relevant Linux kernel module parameters that control how
much physical
memory can be registered, and increase them to allow
registering all
physical memory on your machine.
See this Open MPI FAQ item for more information on these
Linux kernel module
parameters:
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
Local host: n01
Registerable memory: 32768 MiB
Total memory: 65503 MiB
Your MPI job will continue, but may be behave poorly and/or
hang.
--------------------------------------------------------------------------
Process 0 on n01 out of 1
[n01:13534] 7 more processes have sent help message
help-mpi-btl-udapl.txt / dat_ia_open fail
[n01:13534] Set MCA parameter "orte_base_help_aggregate" to
0 to see all help / error messages
Following my setup and other info:
OS: CentOS 6.3 x86_64
installed ofed 3.5 from source ( ./install.pl --all)
installed openmpi 1.6.4 with the following build parameters:
rpmbuild --rebuild
openmpi-1.6.4-1.src.rpm --define '_prefix
/opt/openmpi/1.6.4/gcc' --define '_defaultdocdir
/opt/openmpi/1.6.4/gcc' --define '_mandir
%{_prefix}/share/man' --define '_datadir %{_prefix}/share'
--define 'configure_options --with-openib=/usr
--with-openib-libdir=/usr/lib64 CC=gcc CXX=g++ F77=gfortran
FC=gfortran --enable-mpirun-prefix-by-default
--target=x86_64-unknown-linux-gnu --with-hwloc=/usr/local
--with-libltdl --enable-branch-probabilities --with-udapl
--with-sge --disable-vt' --define 'use_default_rpm_opt_flags
1' --define '_name openmpi-1.6.4_gcc' --define
'install_shell_scripts 1' --define 'shell_scripts_basename
mpivars' --define '_usr /usr' --define 'ofed 0' 2>&1
| tee openmpi.build.sge
(disable -vt was used due to cuda presence which is
automatically linked by vt, and becomes a dependency with no
matching rpm).
memorylocked is unlimited:
->ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 515028
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
IB devices are present:
->ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.9.1000
node_guid: 0002:c903:004d:b0e2
sys_image_guid: 0002:c903:004d:b0e5
vendor_id: 0x02c9
vendor_part_id: 26428
hw_ver: 0xB0
board_id: MT_0D90110009
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 2
port_lid: 53
port_lmc: 0x00
link_layer: InfiniBand
the hello program source:
->cat hello.c
#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[]) {
int numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("Process %d on %s out of %d\n", rank,
processor_name, numprocs);
MPI_Finalize();
}
simply compiled as:
mpicc hello.c -o hello
the IB modules seem to be present:
->service openibd status
HCA driver loaded
Configured IPoIB devices:
ib0
Currently active IPoIB devices:
ib0
The following OFED modules are loaded:
rdma_ucm
rdma_cm
ib_addr
ib_ipoib
mlx4_core
mlx4_ib
mlx4_en
ib_mthca
ib_uverbs
ib_umad
ib_sa
ib_cm
ib_mad
ib_core
iw_cxgb3
iw_cxgb4
iw_nes
ib_qib
Can anyone help?
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
|