Re: [OMPI users] Torque 2.4.3 fails with OpenMPI 1.3.4; no startup at all

2009-12-18 Thread Ralph Castain
Are the OMPI libraries and binaries installed at the same place on all the remote nodes? Are you setting the LD_LIBRARY_PATH correctly? Are the Torque libs available in the same place on the remote nodes? Remember, Torque runs mpirun on a backend node - not on the frontend. These are the most

[OMPI users] Torque 2.4.3 fails with OpenMPI 1.3.4; no startup at all

2009-12-18 Thread Johann Knechtel
Hi all, Your help with the following torque integration issue will be much appreciated: whenever I try to start a openmpi job on more than one node, it simply does not start up on the nodes. The torque job fails with the following: > Fri Dec 18 22:11:07 CET 2009 > OpenMPI with PPU-GCC was loaded

Re: [OMPI users] Debugging spawned processes

2009-12-18 Thread Ashley Pittman
On Wed, 2009-12-16 at 12:06 +0100, jody wrote: > Has anybody got some hints on how to debug spawned processes? If you can live with the processes starting normally and attaching gdb to them after they have started then you could use padb. Assuming you only have one job active (replace -a with th

Re: [OMPI users] NetBSD OpenMPI - SGE - PETSc - PISM

2009-12-18 Thread Ralph Castain
Just as an FYI: I conned the original developer of much of that code into completing the patch over the holidays. So please stay tuned - he'll start with what you already gave us and fix the remaining len=0 problem in the IPv6 code. Ralph On Dec 18, 2009, at 12:19 PM, Jeff Squyres wrote: > On

Re: [OMPI users] NetBSD OpenMPI - SGE - PETSc - PISM

2009-12-18 Thread Jeff Squyres
On Dec 18, 2009, at 2:07 PM, Aleksej Saushev wrote: > > FWIW, we might want to move this discussion to the de...@open-mpi.org > > mailing list... > > I'd prefer some way that doesn't fill my inbox with yet more unrelated mail. > Is there a way to subscribe for posting only? So as not to bore ev

Re: [OMPI users] NetBSD OpenMPI - SGE - PETSc - PISM

2009-12-18 Thread Aleksej Saushev
Hello! [Note! PISM and PETSc mailing lists are removed from cc list.] Jeff Squyres writes: > On Dec 17, 2009, at 5:55 PM, wrote: > >> I guess this means that the PISM and PETSc guys can "stand easy" >> whilst the OpenMPI community needs to follow up on why there's >> a "addr.sa_len=0" creepi

Re: [OMPI users] error performing MPI_Comm_spawn

2009-12-18 Thread Marcia Cristina Cera
I tested my application with the snapshot and it works fine! thanks. márcia. On Thu, Dec 17, 2009 at 6:48 PM, Ralph Castain wrote: > Will be in the 1.4 nightly tarball generated later tonight... > > Thanks again > Ralph > > On Dec 17, 2009, at 4:07 AM, Marcia Cristina Cera wrote: > > very good n

Re: [OMPI users] MPI_Comm_spawn lots of times

2009-12-18 Thread Nicolas Bock
Hi Ralph, I have confirmed that openmpi-1.4a1r22335 works with my master, slave example. The temporary directories are cleaned up properly. Thanks for the help! nick On Thu, Dec 17, 2009 at 13:38, Nicolas Bock wrote: > Ok, I'll give it a try. > > Thanks, nick > > > > On Thu, Dec 17, 2009 at

Re: [OMPI users] How to detect a failure to start-up and MPI_Init()?

2009-12-18 Thread Ralph Castain
The application is terminated and an error message is reported out: mpirun has exited due to process rank 0 with PID 72438 on node Ralph exiting improperly. There are two reasons this could occur: 1. this process did not call "init" before exiting, but others in the job did. This can cause a job

Re: [OMPI users] How to detect a failure to start-up and MPI_Init()?

2009-12-18 Thread Katz, Jacob
Thanks for the fix. What will be the exact behavior after your fix? Re timeouts: Timeout may be indefinite for compliance with the standard. However, apps might optionally use it for their convenience, like in my case. No need to guess anything, but would prevent stuck apps. Unlike regular commu

Re: [OMPI users] How to detect a failure to start-up and MPI_Init()?

2009-12-18 Thread Ralph Castain
Unfortunately, the timeout won't work as there is no MPI requirement to call MPI_Init before some specific point in the application. This would create an experimental process to "guess" the correct timeout on an application-by-application basis - ugly. I have committed code to the OMPI trunk th

Re: [OMPI users] How to detect a failure to start-up and MPI_Init()?

2009-12-18 Thread Katz, Jacob
Yes, the scenario is as you described: one of the processes didn't call MPI_Init and exited "normally". All the rest of the processes got stuck forever in MPI_Init. Ideally, I would like to have a time-out setting for a process to call MPI_Init, which when expired would indicate a failure to sta