Re: [OMPI users] running externalprogram on same processor (Fortran)

2010-03-07 Thread Ralph Castain
Attached is are some simple examples (in C) that collectively does most of what you are trying to do.You have some args wrong in your call. See slave_spawn.c for how to use info_keys.HTHRalph simple_spawn.c Description: Binary data slave_spawn.c Description: Binary data slave.c Description: Bi

[OMPI users] Questions on /tmp/openmpi-sessions-userid directory

2010-03-07 Thread Gijsbert Wiesenekker
I was having non-reproducible hangs in an OpenMPI program. While troubleshooting this problem I found that there were many temporary directories in my /tmp/openmpi-sessions-userid directory (probably the result of MPI_Abort aborted OpenMPI programs). I cleaned those directories up and it looks l

Re: [OMPI users] Segmentation fault when Send/Recv onheterogeneouscluster (32/64 bit machines)

2010-03-07 Thread Jeff Squyres (jsquyres)
Ibm and sun (oracle) have probably done the most heterogeneous testing, but its probably not as stable as our homogeneous code paths. Terry/brad - do you have any insight here? Yes, setting eager limit high can impact performance. Its the amount of data that ompi will send eagerly without waiti

Re: [OMPI users] Questions on /tmp/openmpi-sessions-userid directory

2010-03-07 Thread Reuti
Hi, Am 07.03.2010 um 10:55 schrieb Gijsbert Wiesenekker: I was having non-reproducible hangs in an OpenMPI program. While troubleshooting this problem I found that there were many temporary directories in my /tmp/openmpi-sessions-userid directory (probably the result of MPI_Abort aborted O

Re: [OMPI users] OpenMPI problem on Fedora Core 12

2010-03-07 Thread Gijsbert Wiesenekker
On Jan 12, 2010, at 16:57 , Eugene Loh wrote: > Jeff Squyres wrote: > >> It would be very strange for nanosleep to cause a problem for Open MPI -- it >> shouldn't interfere with any of Open MPI's mechanisms. Double check that >> your my_barrier() function is actually working properly -- remov

Re: [OMPI users] change hosts to restart the checkpoint

2010-03-07 Thread Fernando Lemos
On Fri, Mar 5, 2010 at 12:03 PM, Josh Hursey wrote: > This type of failure is usually due to prelink'ing being left enabled on one > or more of the systems. This has come up multiple times on the Open MPI > list, but is actually a problem between BLCR and the Linux kernel. BLCR has > a FAQ entry o

Re: [OMPI users] Questions on /tmp/openmpi-sessions-userid directory

2010-03-07 Thread Ralph Castain
On Mar 7, 2010, at 2:55 AM, Gijsbert Wiesenekker wrote: > I was having non-reproducible hangs in an OpenMPI program. While > troubleshooting this problem I found that there were many temporary > directories in my /tmp/openmpi-sessions-userid directory (probably the result > of MPI_Abort aborte

Re: [OMPI users] Questions on /tmp/openmpi-sessions-userid directory

2010-03-07 Thread David Turner
Hi Ralph > ... that is fixed in the upcoming 1.4.2 release. Can you say when this release will be generally available? Proliferating session directories are a problem for us too. -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone:

Re: [OMPI users] Questions on /tmp/openmpi-sessions-userid directory

2010-03-07 Thread Ralph Castain
I'm not sure what our release managers have in mind for an official release date. However, in the interim, you can always download the nightly 1.4.2 tarball that includes this fix, among others. http://www.open-mpi.org/nightly/v1.4/ It just hasn't been officially released because folks are stil

[OMPI users] Why might MPI_Recv trip PSM_MQ_RECVREQS_MAX ?

2010-03-07 Thread Jonathan Wesley Stone
Hi, My supercomputer has OpenMPI 1.4. I am running into a frustrating problem with my MPI program. I am using only the following calls, which I expect to be blocking: MPI_Wtime MPI_Error_string MPI_Abort MPI_Send MPI_Get_count MPI_Recv MPI_Probe MPI_Init MPI_Comm_rank MPI_Comm_size MPI_Finalize S