Hi Siegmar,
mpiexec and java run as distinct processes. Your JRE message
says java process raises SEGV. So you should trace the java
process, not the mpiexec process. And more, your JRE message
says the crash happened outside the Java Virtual Machine in
native code. So usual Java program debugger
Doing special files on NFS can be weird, try the other /tmp/ locations:
/var/tmp/
/dev/shm (ram disk careful!)
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
> On Oct 21, 2014, at 10:18 PM, Vinson Leung wrote:
>
> Because of p
Because of permission reason (OpenMPI can not write temporary file to the
default /tmp directory), I change the TMPDIR to my local directory (export
TMPDIR=/home/user/tmp ) and then the MPI program can run. But the CPU
utilization is very low under 20% (8 MPI rank running in Intel Xeon 8-core
CPU).
Hi Bill
I have 2.6.X CentOS stock kernel.
I set both parameters.
It works.
Maybe the parameter names may changed in 3.X kernels?
(Which is really bad ...)
You could check if there is more information in:
/sys/module/mlx4_core/parameters/
There seems to be a thread on the list about this (but ap
On 10/21/2014 04:18 PM, Gus Correa wrote:
> Hi Bill
>
> Maybe you're missing these settings in /etc/modprobe.d/mlx4_core.conf ?
>
> http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem
Ah, that helped. Although:
/lib/modules/3.13.0-36-generic/kernel/drivers/net/ethernet/mellanox/mlx
Hi Bill
Maybe you're missing these settings in /etc/modprobe.d/mlx4_core.conf ?
http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem
I hope this helps,
Gus Correa
On 10/21/2014 06:36 PM, Bill Broadley wrote:
I've setup several clusters over the years with OpenMPI. I often get th
I've setup several clusters over the years with OpenMPI. I often get the below
error:
WARNING: It appears that your OpenFabrics subsystem is configured to only
allow registering part of your physical memory. This can cause MPI jobs to
run with erratic performance, hang, and/or crash.
Hi,
I installed openmpi-dev-124-g91e9686 on Solaris 10 Sparc with
gcc-4.9.1 to track down the error with my small Java program.
I started single stepping in orterun.c at line 1081 and
continued until I got the segmentation fault. I get
"jdata = 0x0" in version openmpi-1.8.2a1r31804, which is the
l
At those sizes it is possible you are running into resource
exhastion issues. Some of the resource exhaustion code paths still lead
to hangs. If the code does not need to be fully connected I would
suggest not using mpi_preconnect_mpi but instead track down why the
initial MPI_Allreduce hangs. I w