Hi,
Am 25.03.2010 um 22:34 schrieb Matthew MacManes:
I am having an OpenMPI issue that seems to be relted to job
scheduling- on TACC, one of the TeraGrid resources.
The program I am trying to run, ABySS, seems to run fine without
scheduling- i.e. when I run it on the login nodes without scheduling
through qsub... but, using that same commande, but scheduling it
through qsub, the job fails..
Here is the qsub script, fyi:
!/bin/bash
#$ -N homo47
#$ -j y
#$ -o homo47
#$ -pe 16way 128
#$ -q normal
#$ -l h_rt=00:30:00
#$ -M macma...@gmail.com
#$ -m be
cd /work/01301/mmacmane/abyss-1.1.2/bin
#$ -cwd
most likely one of the two above lines would sufficient, -cwd would
also make a `cd` two the current working directory.
#$ -V
ibrun ./abyss-pe k=19 in='/work/01301/mmacmane/homo/*.fastq'
name='homo_47' n=5 s=200 c=13
What is `ibrun` doing in detail? Is this something you have to use to
run a job in the Grid?
I get an error message:
TACC: Done.
TACC: Starting up job 1299149
TACC: Setting up parallel environment for OpenMPI mpirun.
TACC: Setup complete. Running job script.
TACC: starting parallel tasks...
/opt/apps/pgi7_2/openmpi/1.3/bin/mpirun -np 64 ABYSS-P
You application was also compiled with Open MPI 1.3, i.e. you use the
same mpirun when you start it on the command line?
-k19 -c13 --coverage-hist=coverage.hist -s bubbles.fa -o
homo_61-1.fa /work/01301/mmacmane/homo/SRR001665_1.fastq /work/01301/
mmacmane/homo/SRR001665_2.fastq /work/01301/mmacmane/homo/
SRR002271_1.fastq /work/01301/mmacmane/homo/SRR002271_2.fastq /work/
01301/mmacmane/homo/SRR002273_1.fastq /work/01301/mmacmane/homo/
SRR002273_2.fastq /work/01301/mmacmane/homo/SRR002274_1.fastq /work/
01301/mmacmane/homo/SRR002274_2.fastq /work/01301/mmacmane/homo/
SRR002275_1.fastq /work/01301/mmacmane/homo/SRR002275_2.fastq /work/
01301/mmacmane/homo/SRR002276_1.fastq /work/01301/mmacmane/homo/
SRR002276_2.fastq /work/01301/mmacmane/homo/SRR002291_1.fastq /work/
01301/mmacmane/homo/SRR002291_2.fastq /work/01301/mmacmane/homo/
SRR002295_1.fastq /work/01301/mmacmane/homo/SRR002295_2.fastq /work/
01301/mmacmane/homo/SRR002297_1.fastq /work/01301/mmacmane/homo/
SRR002297_2.fastq /work/01301/mmacmane/homo/SRR029337_1.fastq /work/
01301/mmacmane/homo/SRR029337_2.fastq
This comes from the expansion of the *, do you want to give the
expression including the * to your application (in this case the
expansion by the `ibrun` must be avoided)?
-- Reuti
...many many lines of this...
[i178-302.ranger.tacc.utexas.edu:28340] [[5795,1],19]
ORTE_ERROR_LOG: A message is attempting to be sent to a process
whose contact information is unknown in file rml_oob_send.c at line
105
[i178-302.ranger.tacc.utexas.edu:28340] [[5795,1],19] could not get
route to [[INVALID],INVALID]
[i178-302.ranger.tacc.utexas.edu:28340] [[5795,1],19]
ORTE_ERROR_LOG: A message is attempting to be sent to a process
whose contact information is unknown in file base/plm_base_proxy.c
at line 85
[i176-303.ranger.tacc.utexas.edu:05045] [[5795,1],1] ORTE_ERROR_LOG:
A message is attempting to be sent to a process whose contact
information is unknown in file rml_oob_send.c at line 105
[i176-303.ranger.tacc.utexas.edu:05045] [[5795,1],1] could not get
route to [[INVALID],INVALID]
[i176-303.ranger.tacc.utexas.edu:05045] [[5795,1],1] ORTE_ERROR_LOG:
A message is attempting to be sent to a process whose contact
information is unknown in file base/plm_base_proxy.c at line 85
[i178-302.ranger.tacc.utexas.edu:28325] [[5795,1],18]
ORTE_ERROR_LOG: A message is attempting to be sent to a process
whose contact information is unknown in file rml_oob_send.c at line
105
[i178-302.ranger.tacc.utexas.edu:28325] [[5795,1],18] could not get
route to [[INVALID],INVALID]
...many many lines of this...
TACC: Cleaning up after job: 1299149
TACC: Done.
The TACC systems administrators don't seem to have a great solution,
and the authors of the program say its MPI-related...
_________________________________
Matthew MacManes
PhD Candidate
University of California- Berkeley
Museum of Vertebrate Zoology
Phone: 510-495-5833
Lab Website: http://ib.berkeley.edu/labs/lacey
Personal Website: http://macmanes.com/
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users