Re: [OMPI users] which info is needed for SIGSEGV in Java for openmpi-dev-124-g91e9686 on Solaris

2014-10-21 Thread Kawashima, Takahiro
Hi Siegmar, mpiexec and java run as distinct processes. Your JRE message says java process raises SEGV. So you should trace the java process, not the mpiexec process. And more, your JRE message says the crash happened outside the Java Virtual Machine in native code. So usual Java program debugger

Re: [OMPI users] low CPU utilization with OpenMPI

2014-10-21 Thread Brock Palen
Doing special files on NFS can be weird, try the other /tmp/ locations: /var/tmp/ /dev/shm (ram disk careful!) Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 > On Oct 21, 2014, at 10:18 PM, Vinson Leung wrote: > > Because of p

[OMPI users] low CPU utilization with OpenMPI

2014-10-21 Thread Vinson Leung
Because of permission reason (OpenMPI can not write temporary file to the default /tmp directory), I change the TMPDIR to my local directory (export TMPDIR=/home/user/tmp ) and then the MPI program can run. But the CPU utilization is very low under 20% (8 MPI rank running in Intel Xeon 8-core CPU).

Re: [OMPI users] New ib locked pages behavior?

2014-10-21 Thread Gus Correa
Hi Bill I have 2.6.X CentOS stock kernel. I set both parameters. It works. Maybe the parameter names may changed in 3.X kernels? (Which is really bad ...) You could check if there is more information in: /sys/module/mlx4_core/parameters/ There seems to be a thread on the list about this (but ap

Re: [OMPI users] New ib locked pages behavior?

2014-10-21 Thread Bill Broadley
On 10/21/2014 04:18 PM, Gus Correa wrote: > Hi Bill > > Maybe you're missing these settings in /etc/modprobe.d/mlx4_core.conf ? > > http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem Ah, that helped. Although: /lib/modules/3.13.0-36-generic/kernel/drivers/net/ethernet/mellanox/mlx

Re: [OMPI users] New ib locked pages behavior?

2014-10-21 Thread Gus Correa
Hi Bill Maybe you're missing these settings in /etc/modprobe.d/mlx4_core.conf ? http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem I hope this helps, Gus Correa On 10/21/2014 06:36 PM, Bill Broadley wrote: I've setup several clusters over the years with OpenMPI. I often get th

[OMPI users] New ib locked pages behavior?

2014-10-21 Thread Bill Broadley
I've setup several clusters over the years with OpenMPI. I often get the below error: WARNING: It appears that your OpenFabrics subsystem is configured to only allow registering part of your physical memory. This can cause MPI jobs to run with erratic performance, hang, and/or crash.

[OMPI users] which info is needed for SIGSEGV in Java for openmpi-dev-124-g91e9686 on Solaris

2014-10-21 Thread Siegmar Gross
Hi, I installed openmpi-dev-124-g91e9686 on Solaris 10 Sparc with gcc-4.9.1 to track down the error with my small Java program. I started single stepping in orterun.c at line 1081 and continued until I got the segmentation fault. I get "jdata = 0x0" in version openmpi-1.8.2a1r31804, which is the l

Re: [OMPI users] large memory usage and hangs when preconnecting beyond 1000 cpus

2014-10-21 Thread Nathan Hjelm
At those sizes it is possible you are running into resource exhastion issues. Some of the resource exhaustion code paths still lead to hangs. If the code does not need to be fully connected I would suggest not using mpi_preconnect_mpi but instead track down why the initial MPI_Allreduce hangs. I w