[OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-01-27 Thread Tom Bryan
I am in the process of setting up a grid engine (SGE) cluster for running Open MPI applications. I'll detail the set up below, but my current problem is that this call to Span_multiple never seems to return. // Spawn all of the children processes. _intercomm = MPI::COMM_WORLD.Spawn_multiple( _nPr

Re: [OMPI users] MPI_Comm_split and intercommunicator - Problem

2012-01-27 Thread Rodrigo Silva Oliveira
Hy Jeff, thanks for replying. Does it mean that you don't have it working properly yet? I read the thread at the devel list where you addressed the problem and a possible solution, but I was not able to find a conclusion about the problem. I'm in trouble without this function. Probably I'll need

Re: [OMPI users] pure static "mpirun" launcher

2012-01-27 Thread Jeff Squyres
Ah ha, I think I got it. There was actually a bug about disabling the memory manager in trunk/v1.5.x/v1.4.x. I fixed it on the trunk and scheduled it for v1.6 (since we're trying very hard to get v1.5.5 out the door) and v1.4.5. On the OMPI trunk on RHEL 5 with gcc 4.4.6, I can do this: ./con

Re: [OMPI users] MPI_AllGather null terminator character

2012-01-27 Thread Jeff Squyres
Ah, I have an idea what might be happening here: I believe that valgrind is actually pretty smart. If you have a buffer of size 128, and gethostname() only fills in, say, the first 32 bytes (including the \0), the other 128-32=96 bytes are uninitialized. You can MPI_Allgather these, in which c

Re: [OMPI users] MPI_Comm_split and intercommunicator - Problem

2012-01-27 Thread Jeff Squyres
Unfortunately, I think that this is a known problem with INTERCOMM_MERGE and COMM_SPAWN parents and children: https://svn.open-mpi.org/trac/ompi/ticket/2904 On Jan 26, 2012, at 12:11 PM, Rodrigo Oliveira wrote: > Hi there, I tried to understand the behavior Thatyene said and I think is a

Re: [OMPI users] pure static "mpirun" launcher

2012-01-27 Thread Jeff Squyres
I've tried a bunch of variations on this, but I'm actually getting stymied by my underlying OS not supporting static linking properly. :-\ I do see that Libtool is stripping out the "-static" standalone flag that you passed into LDFLAGS. Yuck. What's -Wl,-E? Can you try "-Wl,-static" instead

Re: [OMPI users] OpenMPI: How many connections?

2012-01-27 Thread Prentice Bisbal
I would like to nominate the quote below for the best explanation of how a piece of software works that I've ever read. Kudos, Jeff. On 01/26/2012 04:38 PM, Jeff Squyres wrote: > You send a message, a miracle occurs, and the message is received on the > other side. -- Prentice

[OMPI users] MPI_Barrier, again

2012-01-27 Thread Evgeniy Shapiro
Hi I have a strange problem with MPI_Barrier occurring when writing to a file. The output subroutine (the code is in FORTRAN) is called from the main program and there is an MPI_Barrier just before the call. In the subroutine 1. Process 0 checks whether the first file exists and, if not, - creat

Re: [OMPI users] MPI_AllGather null terminator character

2012-01-27 Thread Gabriele Fatigati
Dear Ralph, thanks for the suggest, but doesn't solve the problem :( The warning still exists. 2012/1/27 Ralph Castain > I suspect that valgrind doesn't recognize that MPI_Allgather will ensure > that hostname_recv_buf is filled prior to calling strcmp. If you want to > eliminate the warning, y

Re: [OMPI users] MPI_AllGather null terminator character

2012-01-27 Thread Ralph Castain
I suspect that valgrind doesn't recognize that MPI_Allgather will ensure that hostname_recv_buf is filled prior to calling strcmp. If you want to eliminate the warning, you should memset hostname_recv_buf to 0 so it has a guaranteed value. On Jan 27, 2012, at 6:21 AM, Gabriele Fatigati wrote:

Re: [OMPI users] MPI_AllGather null terminator character

2012-01-27 Thread Gabriele Fatigati
Hi Jeff, yes, very stupid bug in a code, but also with the correction the problem with Valgrind in strcmp remains: ==21779== Conditional jump or move depends on uninitialised value(s) ==21779==at 0x4A0898C: strcmp (mc_replace_strmem.c:711) ==21779==by 0x400BA8: main (all_gather.c:28) ==21

Re: [OMPI users] MPI_Allgather problem

2012-01-27 Thread Jeff Squyres
On Jan 27, 2012, at 5:12 AM, Brett Tully wrote: > Looking at the change log for 1.5.1 I see: > - Use memmove (instead of memcpy) when necessary (e.g., source and > destination overlap). Checking the logs, it looks like that fix was in 1.4.3, too. Do you know if your application has sends/receiv

Re: [OMPI users] MPI_AllGather null terminator character

2012-01-27 Thread Jeff Squyres
I see one problem: gethostname(local_hostname, sizeof(local_hostname)); That should be: gethostname(local_hostname, max_name_len); because sizeof(local_hostname) will be sizeof(void*). But if that's what you were intending, just to simulate a small hostname buffer, then be aware that

Re: [OMPI users] MPI_Allgather problem

2012-01-27 Thread TERRY DONTJE
ompi_info should tell you the current version of Open MPI your path is pointing to. Are you sure your path is pointing to the area that the OpenFOAM package delivered Open MPI into? --td On 1/27/2012 5:02 AM, Brett Tully wrote: Interesting. In the same set of updates, I installed OpenFOAM from

Re: [OMPI users] MPI_Allgather problem

2012-01-27 Thread Brett Tully
Looking at the change log for 1.5.1 I see: - Use memmove (instead of memcpy) when necessary (e.g., source and destination overlap). It seems as though this might be a likely candidate for a change that might fix my problems if I am indeed using 1.5.3 following the installation of OpenFOAM? On Fri

Re: [OMPI users] MPI_Allgather problem

2012-01-27 Thread Brett Tully
Interesting. In the same set of updates, I installed OpenFOAM from their Ubuntu deb package and it claims to ship with openmpi. I just downloaded their Third-party source tar and unzipped it to see what version of openmpi they are using, and it is 1.5.3. However, when I do man openmpi, or ompi_info

Re: [OMPI users] MPI_AllGather null terminator character

2012-01-27 Thread Gabriele Fatigati
Sorry, this is the right code. 2012/1/27 Gabriele Fatigati > Hi Jeff, > > The problem is when I use strcmp on ALLGather buffer and Valgrind that > raise a warning. > > Please check if the attached code is right, where size(local_hostname) is > very small. > > Valgrind is used as: > > mpirun val

Re: [OMPI users] MPI_AllGather null terminator character

2012-01-27 Thread Gabriele Fatigati
Hi Jeff, The problem is when I use strcmp on ALLGather buffer and Valgrind that raise a warning. Please check if the attached code is right, where size(local_hostname) is very small. Valgrind is used as: mpirun valgrind --leak-check=full --tool=memcheck ./all_gather and openmpi/1.4.4 compiled