Back to this problem.

The last suggestion was to upgrade to 1.3.3, which has been done. Still cannot 
get this code to 
run in 64 bit mode with torque. What I can do is run the job in l6 bit mode 
using a  hostfile.
Specifically, if I use
qsub  -I -l nodes=2:ppn=1 torque allocates two nodes to the job, and since this 
is an interactive
shell, logs me in to the controlling node. In this example process rank 0 is 
n72 and process rank 1 is n89:
[sims@n72 4000]$ mpirun --display-allocation -pernode --display-map hostname

======================   ALLOCATED NODES   ======================

 Data for node: Name: n72.clust.nist.gov        Num slots: 1    Max slots: 0
 Data for node: Name: n89       Num slots: 1    Max slots: 0

=================================================================

 ========================   JOB MAP   ========================

 Data for node: Name: n72.clust.nist.gov        Num procs: 1
        Process OMPI jobid: [47657,1] Process rank: 0

 Data for node: Name: n89       Num procs: 1
        Process OMPI jobid: [47657,1] Process rank: 1

 =============================================================
n89
n72.clust.nist.gov

My hostfile is
[sims@n72 4000]$ cat hostfile
n72
n89


If, logged in to n72, I use the command
mpirun -np 2 ./MPI_li_64 
the job fails with a 
mpirun noticed that process rank 1 with PID 10538 on node n89 exited on signal 
11 (Segmentation fault).

If I use the command 
mpirun -np 2 --hostfile hostfile ./MPI_li_64 
the same thing happens.

However, if I ssh to n73, for example, and use the command 
mpirun -np 2 --hostfile hostfile ./MPI_li_64
everything works fine. So it appears that the problem is with torque. 

Any ideas?

Reply via email to