Could you send the contents of a PBS_NODEFILE from a Torque 2.3.7 allocation, and the man page for tm_spawn?

My only guess would be that something changed in those areas as we don't really use anything else from Torque, and run on Torque-based clusters in production every day. Not sure what version we have here, though I believe it is pretty current (will check).

You also might want to configure OMPI 1.3.3 with --enable-debug. You could then do a run with -mca ras_base_verbose 5 -mca plm_base_verbose 5 --debug-daemons on your mpirun cmd line to get a step-by-step diagnostic output of the interaction with Torque. Should give us some idea of where the failure is occurring.

Ralph

On Jul 31, 2009, at 7:20 AM, Wilko Keegstra wrote:

hi,

I have the following problem:

I am using openmpi 1.3.3

programs (directly and from scripts) submitted with mpiexec are running
fine.

programs (directly and from scripts) submitted through Torque 2.3.7
with openmpi compiled with --with-tm (and torque-devel) installed
give segfaulting of the programs.

programs submitted through Torque 2.3.7 directly with openmpi
compiled without --with-tm (and NO torque-devel installed) run fine
however mpiexec programs from script (script submiited through torque)
are only running on 1 node, so I need openmpi compiled with --with-tm

We also have a cluster running with openmpi 1.2.9 compiled without
--with-tm in combination with torque 2.3.3 and everything is running
fine, so NO segfaults and mpiexec from script also running on the
nodes selected at submitting time.

I don't have errors on log files only on the job log file:

---------------------------------------------------------------------------
mpiexec noticed that process rank 7 with PID 3150 on node
rugem21.chem.rug.nl exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Could anyone please help me,
many thanks in advance
Wilko Keegstra

--
+-------------------------------------------------------------+
| Dr. Wilko Keegstra    priv.phone: +31594514153,+31610477915 |
| Groningen University       email: w.keegs...@rug.nl         |
| Groningen Biomolecular Sciences and Biotechnology Institute |
| Nijenborgh 4               phone: +31503634224              |
| 9747 AG GRONINGEN          fax  : +31503634800              |
| The Netherlands                                             |
+-------------------------------------------------------------+
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to