Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-01-31 Thread Rayson Ho
On Mon, Jan 30, 2012 at 11:33 PM, Tom Bryan wrote: > For our use, yes, spawn_multiple makes sense.  We won't be spawning lots and > lots of jobs in quick succession.  We're using MPI as an robust way to get > IPC as we spawn multiple child processes while using SGE to help us with > load balancing

Re: [OMPI users] Latest Intel Compilers (ICS, version 12.1.0.233 Build 20110811) issues ...

2012-01-31 Thread Götz Waschk
On Mon, Jan 30, 2012 at 5:11 PM, Richard Walsh wrote: > I have not seen this mpirun error with the OpenMPI version I have built > with Intel 12.1 and the mpicc fix: > openmpi-1.5.5rc1.tar.bz2 Hi, I haven't tried that version yet. I was trying to build a supplementary package to the openmpi 1.5.3

Re: [OMPI users] MPI_AllGather null terminator character

2012-01-31 Thread Gabriele Fatigati
Dear Jeff, I have very interesting news. I recompiled OpenMPI 1.4.4 enabling the memchecker. Now the warning on strcmp is disappeared also without buffers initializations using memset! So the warning is a false positive? My simple code is safe? Thanks. 2012/1/28 Jeff Squyres > On Jan 28, 201

Re: [OMPI users] [openib] segfault when using openib btl

2012-01-31 Thread Eloi Gaudry
Hi, I just would like to give you an update on this issue. Since we are using OpenMPI-1.4.4, we cannot reproduce it anymore. Regards, Eloi On 09/29/2010 06:01 AM, Nysal Jan wrote: Hi Eloi, We discussed this issue during the weekly developer meeting & there were no further suggestions, apart

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-01-31 Thread Reuti
Am 31.01.2012 um 06:33 schrieb Rayson Ho: > On Mon, Jan 30, 2012 at 11:33 PM, Tom Bryan wrote: >> For our use, yes, spawn_multiple makes sense. We won't be spawning lots and >> lots of jobs in quick succession. We're using MPI as an robust way to get >> IPC as we spawn multiple child processes

[OMPI users] core binding failure on Interlagos (and possibly Magny-Cours)

2012-01-31 Thread Dave Love
This is to help anyone else having this problem, as it doesn't seem to be mentioned anywhere I can find, rather surprisingly. Core binding is broken on Interlagos with open-mpi 1.5.4. I guess it also bites on Magny-Cours, but all our systems are currently busy and I can't check. It does work, at

Re: [OMPI users] MPI_AllGather null terminator character

2012-01-31 Thread Jeff Squyres
On Jan 31, 2012, at 3:59 AM, Gabriele Fatigati wrote: > I have very interesting news. I recompiled OpenMPI 1.4.4 enabling the > memchecker. > > Now the warning on strcmp is disappeared also without buffers initializations > using memset! > > So the warning is a false positive? My simple code

Re: [OMPI users] core binding failure on Interlagos (and possibly Magny-Cours)

2012-01-31 Thread Jeff Squyres
On Jan 31, 2012, at 6:18 AM, Dave Love wrote: > Core binding is broken on Interlagos with open-mpi 1.5.4. I guess it > also bites on Magny-Cours, but all our systems are currently busy and I > can't check. > > It does work, at least basically, in 1.5.5rc1, but the release notes for > that don't

[OMPI users] Invitation to connect on LinkedIn

2012-01-31 Thread Song Guo via LinkedIn
LinkedIn Song Guo requested to add you as a connection on LinkedIn: -- Mohan, I'd like to add you to my professional network on LinkedIn. - Song Accept invitation from Song Guo http://www.linkedin.com/e/kq0fyp-gy2z9znd-1x/uYFEuWAc-_

Re: [OMPI users] core binding failure on Interlagos (and possibly Magny-Cours)

2012-01-31 Thread Brice Goglin
Le 31/01/2012 14:24, Jeff Squyres a écrit : > On Jan 31, 2012, at 6:18 AM, Dave Love wrote: > >> Core binding is broken on Interlagos with open-mpi 1.5.4. I guess it >> also bites on Magny-Cours, but all our systems are currently busy and I >> can't check. >> >> It does work, at least basically, i

Re: [OMPI users] core binding failure on Interlagos (and possibly Magny-Cours)

2012-01-31 Thread Jeff Squyres
On Jan 31, 2012, at 8:49 AM, Brice Goglin wrote: > Unless I am mistaken, OMPI 1.5.4 has hwloc 1.2 Correct. > while 1.5.5 will have > 1.2.2 or even 1.3.1. So don't use core binding on interlagos with > OMPI<=1.5.4. OMPI 1.5.5rc1 has hwloc 1.3.1 + a few SVN commits past it. Per some off-list dis

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-01-31 Thread Reuti
Am 31.01.2012 um 05:33 schrieb Tom Bryan: >> Suppose you want to start 4 additional tasks, you would need 5 in total from >> SGE. > > OK, thanks. I'll try other values. BTW: there is a setting in the PE definition to allow one addititonal task: $ qconf -sp openmpi ... job_is_first_task FALSE

Re: [OMPI users] MPI_AllGather null terminator character

2012-01-31 Thread Gabriele Fatigati
Ok Jeff, thanks very much for your support! Regards, 2012/1/31 Jeff Squyres > On Jan 31, 2012, at 3:59 AM, Gabriele Fatigati wrote: > > > I have very interesting news. I recompiled OpenMPI 1.4.4 enabling the > memchecker. > > > > Now the warning on strcmp is disappeared also without buffers > i

[OMPI users] OpenMPI / SLURM -> Send/Recv blocking

2012-01-31 Thread adrian sabou
Hi All,   I'm having this weird problem when running a very simple OpenMPI application. The application sends an integer from the rank 0 process to the rank 1 process. The sequence of code that I use to accomplish this is the following: if (rank == 0) { printf("Pro

Re: [OMPI users] Latest Intel Compilers (ICS, version 12.1.0.233 Build 20110811) issues ...

2012-01-31 Thread Richard Walsh
Gotz, Sorry, I was in a rush and missed that. Here is some further information the compiler options used by me for the 1.5.5 build: [richard.walsh@bob linux]$ pwd /share/apps/openmpi-intel/1.5.5/build/opal/mca/memory/linux [richard.walsh@bob linux]$ make -n malloc.o echo " CC" malloc.o;de

Re: [OMPI users] core binding failure on Interlagos (and possibly Magny-Cours)

2012-01-31 Thread Dave Love
Brice Goglin writes: > Note that magny-Cours processors are OK, cores are "normal" there. Apologies for the bad guess about the architecture, and thanks for the info. > FWIW, the Linux kernel (at least up to 3.2) still reports wrong L2 and > L1i cache information on AMD Bulldozer. Kernel bug re

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-01-31 Thread Dave Love
Reuti writes: > Maybe it's a side effect of a tight integration that it would start on > the correct nodes (but I face an incorrect allocation of slots and an > error message at the end if started without mpiexec), as in this case > it has no command line option for the hostfile. How to get the >

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-01-31 Thread Jeff Squyres
I only noticed after the fact that Tom is also here at Cisco (it's a big company, after all :-) ). I've contacted him using our proprietary super-secret Cisco handshake (i.e., the internal phone network); I'll see if I can figure out the issues off-list. On Jan 31, 2012, at 1:08 PM, Dave Love

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-01-31 Thread Reuti
Am 31.01.2012 um 20:12 schrieb Jeff Squyres: > I only noticed after the fact that Tom is also here at Cisco (it's a big > company, after all :-) ). > > I've contacted him using our proprietary super-secret Cisco handshake (i.e., > the internal phone network); I'll see if I can figure out the is

[OMPI users] Segfault on mpirun with OpenMPI 1.4.5rc2

2012-01-31 Thread Daniel Milroy
Hello, I have built OpenMPI 1.4.5rc2 with Intel 12.1 compilers in an HPC environment. We are running RHEL 5, kernel 2.6.18-238 with Intel Xeon X5660 cpus. You can find my build options below. In an effort to test the OpenMPI build, I compiled "Hello world" with an mpi_init call in C and Fortran

Re: [OMPI users] Segfault on mpirun with OpenMPI 1.4.5rc2

2012-01-31 Thread Jeff Squyres
We have heard reports of failures with the Intel 12.1 compilers. Can you try with rc4 (that was literally just released) with the --without-memory-manager configure option? On Jan 31, 2012, at 2:19 PM, Daniel Milroy wrote: > Hello, > > I have built OpenMPI 1.4.5rc2 with Intel 12.1 compilers i

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-01-31 Thread Ralph Castain
Not sure I fully grok this thread, but will try to provide an answer. When you start a singleton, it spawns off a daemon that is the equivalent of "mpirun". This daemon is created for the express purpose of allowing the singleton to use MPI dynamics like comm_spawn - without it, the singleton wo

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-01-31 Thread Reuti
Am 31.01.2012 um 20:38 schrieb Ralph Castain: > Not sure I fully grok this thread, but will try to provide an answer. > > When you start a singleton, it spawns off a daemon that is the equivalent of > "mpirun". This daemon is created for the express purpose of allowing the > singleton to use M

Re: [OMPI users] core binding failure on Interlagos (and possibly Magny-Cours)

2012-01-31 Thread Brice Goglin
Le 31/01/2012 19:02, Dave Love a écrit : >> FWIW, the Linux kernel (at least up to 3.2) still reports wrong L2 and >> L1i cache information on AMD Bulldozer. Kernel bug reported at >> https://bugzilla.kernel.org/show_bug.cgi?id=42607 > I assume that isn't relevant for open-mpi, just other things.

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-01-31 Thread Ralph Castain
On Jan 31, 2012, at 12:58 PM, Reuti wrote: > > Am 31.01.2012 um 20:38 schrieb Ralph Castain: > >> Not sure I fully grok this thread, but will try to provide an answer. >> >> When you start a singleton, it spawns off a daemon that is the equivalent of >> "mpirun". This daemon is created for th

Re: [OMPI users] core binding failure on Interlagos (and possibly Magny-Cours)

2012-01-31 Thread Jeff Squyres
On Jan 31, 2012, at 3:20 PM, Brice Goglin wrote: > In 1.5.x, cache info doesn't matter as far as I know. > > In trunk, the affinity code has been reworked. I think you can bind > process to caches there. Binding to L2 wouldn't work as expected (would > bind to one core instead of 2). hwloc doesn'