Are the OMPI libraries and binaries installed at the same place on all the
remote nodes?
Are you setting the LD_LIBRARY_PATH correctly?
Are the Torque libs available in the same place on the remote nodes? Remember,
Torque runs mpirun on a backend node - not on the frontend.
These are the most
Hi all,
Your help with the following torque integration issue will be much
appreciated: whenever I try to start a openmpi job on more than one
node, it simply does not start up on the nodes.
The torque job fails with the following:
> Fri Dec 18 22:11:07 CET 2009
> OpenMPI with PPU-GCC was loaded
On Wed, 2009-12-16 at 12:06 +0100, jody wrote:
> Has anybody got some hints on how to debug spawned processes?
If you can live with the processes starting normally and attaching gdb
to them after they have started then you could use padb.
Assuming you only have one job active (replace -a with th
Just as an FYI: I conned the original developer of much of that code into
completing the patch over the holidays. So please stay tuned - he'll start with
what you already gave us and fix the remaining len=0 problem in the IPv6 code.
Ralph
On Dec 18, 2009, at 12:19 PM, Jeff Squyres wrote:
> On
On Dec 18, 2009, at 2:07 PM, Aleksej Saushev wrote:
> > FWIW, we might want to move this discussion to the de...@open-mpi.org
> > mailing list...
>
> I'd prefer some way that doesn't fill my inbox with yet more unrelated mail.
> Is there a way to subscribe for posting only?
So as not to bore ev
Hello!
[Note! PISM and PETSc mailing lists are removed from cc list.]
Jeff Squyres writes:
> On Dec 17, 2009, at 5:55 PM, wrote:
>
>> I guess this means that the PISM and PETSc guys can "stand easy"
>> whilst the OpenMPI community needs to follow up on why there's
>> a "addr.sa_len=0" creepi
I tested my application with the snapshot and it works fine!
thanks.
márcia.
On Thu, Dec 17, 2009 at 6:48 PM, Ralph Castain wrote:
> Will be in the 1.4 nightly tarball generated later tonight...
>
> Thanks again
> Ralph
>
> On Dec 17, 2009, at 4:07 AM, Marcia Cristina Cera wrote:
>
> very good n
Hi Ralph,
I have confirmed that openmpi-1.4a1r22335 works with my master, slave
example. The temporary directories are cleaned up properly.
Thanks for the help!
nick
On Thu, Dec 17, 2009 at 13:38, Nicolas Bock wrote:
> Ok, I'll give it a try.
>
> Thanks, nick
>
>
>
> On Thu, Dec 17, 2009 at
The application is terminated and an error message is reported out:
mpirun has exited due to process rank 0 with PID 72438 on
node Ralph exiting improperly. There are two reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job
Thanks for the fix. What will be the exact behavior after your fix?
Re timeouts: Timeout may be indefinite for compliance with the standard.
However, apps might optionally use it for their convenience, like in my case.
No need to guess anything, but would prevent stuck apps.
Unlike regular commu
Unfortunately, the timeout won't work as there is no MPI requirement to call
MPI_Init before some specific point in the application. This would create an
experimental process to "guess" the correct timeout on an
application-by-application basis - ugly.
I have committed code to the OMPI trunk th
Yes, the scenario is as you described: one of the processes didn't call
MPI_Init and exited "normally". All the rest of the processes got stuck forever
in MPI_Init.
Ideally, I would like to have a time-out setting for a process to call
MPI_Init, which when expired would indicate a failure to sta
12 matches
Mail list logo