Re: [OMPI users] collective algorithms

2014-11-18 Thread George Bosilca
Daniel, Many papers have been published about the performance modeling of different collective communications algorithms (and fortunately these models are implementation independent). I can pinpoint you to our research in collective modeling which is the underlying infrastructure behind the decisi

Re: [OMPI users] UC job running out of memory

2014-11-18 Thread Jerry Mersel
Thanks. I'll check. With Blessings, always, Jerry Mersel [cid:image003.png@01CF80E7.62B7D810] System Administrator IT Infrastructure Branch | Division of Information Systems Weizmann Institute of Science Rehovot 76100, Israel Tel: +972-8-9342363 "allow our heart, the hear

Re: [OMPI users] job running out of memory

2014-11-18 Thread Jerry Mersel
Thank you for your response. I will investigate further. With Blessings, always, Jerry Mersel [cid:image003.png@01CF80E7.62B7D810] System Administrator IT Infrastructure Branch | Division of Information Systems Weizmann Institute of Science Rehovot 76100, Israel Tel: +972-

Re: [OMPI users] job running out of memory

2014-11-18 Thread Ralph Castain
Unfortunately, there is no way to share memory across nodes. Running out of memory as you describe can be due to several factors, including most typically: * a memory leak in the application, or the application simply growing too big for the environment * one rank running slow, causing it to buil

Re: [OMPI users] UC job running out of memory

2014-11-18 Thread Rushton Martin
I've seen several suggestions for "home-brew" systems, usually modifying the paging mechanism. However there is one commercial solution I have seen advertised: https://numascale.com/index.html I've never used them and have no idea if they are any good or as good as they claim, you'll have to d

[OMPI users] job running out of memory

2014-11-18 Thread Jerry Mersel
Hi all: I am running openmpi 1.6.5 and a job which is memory intensive. The job runs on 7 hosts using 16 core on each. On one of the hosts the memory is exhausted so the kernel starts to Kill the processes. It could be that there is plenty of free memory on one of the other hosts. Is

Re: [OMPI users] Fortran and OpenMPI 1.8.3 compiled with Intel-15 does nothing silently

2014-11-18 Thread Ralph Castain
Best guess is you are seeing a race condition. If a proc immediately fails, we will respond by aborting the launch of any other local processes as we are going to kill the entire job. So if I get several of them started before the first one aborts, then any remaining ones will never get spawned, an

Re: [OMPI users] Fortran and OpenMPI 1.8.3 compiled with Intel-15 does nothing silently

2014-11-18 Thread Michael.Rachner
Tip: INTEL-Ftn-compiler problems can be communicated to INTEL there: https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x Greetings Michael Rachner Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von John Bray Gesendet: Dienstag, 18. November 2014 11:0

Re: [OMPI users] Fortran and OpenMPI 1.8.3 compiled with Intel-15 does nothing silently

2014-11-18 Thread John Bray
The original problem used a separate file and not a module. Its clearly a bizarre Intel bug, I am only continuing to persue it here as I'm curious as to why the segfault messages disappear at higher process counts John On 18 November 2014 09:58, wrote: > It may be possibly a bug in Intel-15.0

Re: [OMPI users] Fortran and OpenMPI 1.8.3 compiled with Intel-15 does nothing silently

2014-11-18 Thread Michael.Rachner
It may be possibly a bug in Intel-15.0 . I suspect it has to do with the contains-block and with the fact, that you call an intrinsic sbr in that contains-block. Normally this must work. You may try to separate the influence of both: What happens with these 3 variants of your code: variant a

Re: [OMPI users] Fortran and OpenMPI 1.8.3 compiled with Intel-15 does nothing silently

2014-11-18 Thread John Bray
A delightful bug this, you get a segfault if you code contains a random_number call and is compiled with -fopenmp, EVEN IF YOU CANNOT CALL IT! program fred use mpi integer :: ierr call mpi_init(ierr) print *,"hello" call mpi_finalize(ierr) contains subroutine sub real :: a(10) call rando