Works like a charm, thanks Tim!

-----Original Message-----
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Tim Prins
Sent: 01 April 2007 14:50
To: Open MPI Users
Subject: Re: [OMPI users] Torque/OpenMPI[Scanned]

Hi Barry,

The problem is the line:
ncpus=`wc -l $PBS_NODEFILE`

wc will print out the file name after the count. So ncpus gets "16 / 
var/spool/torque/aux//350.wc01" and your mpirun command will look like:
mpirun -np 16 /var/spool/torque/aux//350.wc01 /home/test/hpcc-1.0.0/hpcc

So mpirun will try to execute  /var/spool/torque/aux//350.wc01

A solution to this is that Open MPI will run on every available slot  
if -np is not passed. So you could just use the script:

HPCC_HOME=/home/test/hpcc-1.0.0
mpirun $HPCC_HOME/hpcc

This will launch one process on every CPU reported by Torque.

Alternatively, you could have wc read from stdin instead of from a file:
ncpus=`wc -l < $PBS_NODEFILE`

this will avoid the filename being printed.

Hope this helps,

Tim

On Apr 1, 2007, at 9:16 AM, Barry Evans wrote:

> Hello,
>
>
>
> Having a bit of trouble running Open MPI 1.2 under Torque 2.1.8.
>
>
>
> My Script contains the following:
>
> -----------------------------------------------
>
> HPCC_HOME=/home/test/hpcc-1.0.0
>
> ncpus=`wc -l $PBS_NODEFILE`
>
> mpirun -np $ncpus $HPCC_HOME/hpcc
>
> -----------------------------------------------
>
>
>
>
>
> When I try to run on 4 nodes, 4 cpus each I receive the following  
> in my err file:
>
>
>
> [node003:04409] [0,0,4] ORTE_ERROR_LOG: Not found in file  
> odls_default_module.c at line 1188
>
> [node008:06691] [0,0,1] ORTE_ERROR_LOG: Not found in file  
> odls_default_module.c at line 1188
>
> [node007:04352] [0,0,2] ORTE_ERROR_LOG: Not found in file  
> odls_default_module.c at line 1188
>
> ----------------------------------------------------------------------

> ----
>
> Failed to find or execute the following executable:
>
>
>
> Host:       node007
>
> Executable: /var/spool/torque/aux//350.wc01
>
>
>
> Cannot continue.
>
> ----------------------------------------------------------------------

> ----
>
> [no-------------------------------------------------------------------

> -------
>
> Failed to find or execute the following executable:
>
>
>
> Host:       node004
>
> Executable: /var/spool/torque/aux//350.wc01
>
>
>
> Cannot continue.
>
> ----------------------------------------------------------------------

> ----
>
> de004:04364] [0,0,3] ORTE_ERROR_LOG: Not found in file  
> odls_default_module.c at line 1188
>
> ----------------------------------------------------------------------

> ----
>
> Failed to find or execute the following executable:
>
>
>
> Host:       node003
>
> Executable: /var/spool/torque/aux//350.wc01
>
>
>
> Cannot continue.
>
> ----------------------------------------------------------------------

> ----
>
> ----------------------------------------------------------------------

> ----
>
> Failed to find or execute the following executable:
>
>
>
> Host:       node008
>
> Executable: /var/spool/torque/aux//350.wc01
>
>
>
> Cannot continue.
>
> ----------------------------------------------------------------------

> ----
>
> [node007:04352] [0,0,2] ORTE_ERROR_LOG: Not found in file orted.c  
> at line 588
>
> [node008:06691] [0,0,1] ORTE_ERROR_LOG: Not found in file orted.c  
> at line 588
>
> [node004:04364] [0,0,3] ORTE_ERROR_LOG: Not found in file orted.c  
> at line 588
>
> [node003:04409] [0,0,4] ORTE_ERROR_LOG: Not found in file orted.c  
> at line 588
>
>
>
>
>
> Has anyone seen this before? It seems odd that openmpi would be  
> trying to execute what is effectively the host file. I stuck a  
> sleep in to make sure the file was being distributed, and sure  
> enough, it was there. I am able to run mvapich through torque  
> without issue and openmpi from the command line.
>
>
>
> Cheers,
>
> Barry Evans
>
> Technical Manager
>
> OCF plc
>
> +44 (0)7970 148 121
>
> bev...@ocf.co.uk
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to