Re: [OMPI users] Open MPI:Problem with 64-bit openMPI andintel compiler

Ralph Castain Wed, 12 Aug 2009 07:47:40 -0400

We use Torque with OMPI here on almost every cluster, running 64-bitjobs with the Intel compilers, so I doubt the problem is with Torque.It is probably an issue with library paths.

Torque doesn't automatically forward your environment, nor does itexecute your remote .bashrc (or equivalent) when starting your remoteprocess. While ssh also typically doesn't forward the environment(though your sys admin may have set it up to do so), it does executethe remote .bashrc, which could be setting the correct path. I shouldalso note that mpirun will automatically forward LD_LIBRARY_PATH andPATH for you, which is something different from what we do for theother launchers.

If you execute your MPI_Ii_64 program locally on each of your nodes(i.e., both processes run local), does it work? If so, then try adding


-x LD_LIBRARY_PATH

to your mpirun cmd line. This will tell mpirun to pickup your locallib-path and forward it for you regardless of the launch environment.



On Aug 11, 2009, at 10:17 PM, Sims, James S. Dr. wrote:

Back to this problem.
The last suggestion was to upgrade to 1.3.3, which has been done.Still cannot get this code torun in 64 bit mode with torque. What I can do is run the job in l6bit mode using a hostfile.
Specifically, if I use
qsub -I -l nodes=2:ppn=1 torque allocates two nodes to the job, andsince this is an interactiveshell, logs me in to the controlling node. In this example processrank 0 is n72 and process rank 1 is n89:[sims@n72 4000]$ mpirun --display-allocation -pernode --display-maphostname
======================   ALLOCATED NODES   ======================
Data for node: Name: n72.clust.nist.gov Num slots: 1 Maxslots: 0
Data for node: Name: n89       Num slots: 1    Max slots: 0

=================================================================

========================   JOB MAP   ========================

Data for node: Name: n72.clust.nist.gov        Num procs: 1
       Process OMPI jobid: [47657,1] Process rank: 0

Data for node: Name: n89       Num procs: 1
       Process OMPI jobid: [47657,1] Process rank: 1

=============================================================
n89
n72.clust.nist.gov

My hostfile is
[sims@n72 4000]$ cat hostfile
n72
n89


If, logged in to n72, I use the command
mpirun -np 2 ./MPI_li_64
the job fails with a
mpirun noticed that process rank 1 with PID 10538 on node n89 exitedon signal 11 (Segmentation fault).
If I use the command
mpirun -np 2 --hostfile hostfile ./MPI_li_64
the same thing happens.

However, if I ssh to n73, for example, and use the command
mpirun -np 2 --hostfile hostfile ./MPI_li_64
everything works fine. So it appears that the problem is with torque.

Any ideas?
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Open MPI:Problem with 64-bit openMPI andintel compiler

Reply via email to