Hi,

Am 25.03.2010 um 22:34 schrieb Matthew MacManes:

I am having an OpenMPI issue that seems to be relted to job scheduling- on TACC, one of the TeraGrid resources.

The program I am trying to run, ABySS, seems to run fine without scheduling- i.e. when I run it on the login nodes without scheduling through qsub... but, using that same commande, but scheduling it through qsub, the job fails..

Here is the qsub script, fyi:

!/bin/bash      
#$ -N homo47
#$ -j y
#$ -o homo47
#$ -pe 16way 128
#$ -q normal    


#$ -l h_rt=00:30:00     
#$ -M   macma...@gmail.com
#$ -m be
cd /work/01301/mmacmane/abyss-1.1.2/bin
#$ -cwd
most likely one of the two above lines would sufficient, -cwd would also make a `cd` two the current working directory.

#$ -V
ibrun ./abyss-pe k=19 in='/work/01301/mmacmane/homo/*.fastq' name='homo_47' n=5 s=200 c=13
What is `ibrun` doing in detail? Is this something you have to use to run a job in the Grid?

I get an error message:
TACC: Done.
TACC: Starting up job 1299149
TACC: Setting up parallel environment for OpenMPI mpirun.
TACC: Setup complete. Running job script.
TACC: starting parallel tasks...
/opt/apps/pgi7_2/openmpi/1.3/bin/mpirun -np 64 ABYSS-P
You application was also compiled with Open MPI 1.3, i.e. you use the same mpirun when you start it on the command line?

-k19 -c13 --coverage-hist=coverage.hist -s bubbles.fa -o homo_61-1.fa /work/01301/mmacmane/homo/SRR001665_1.fastq /work/01301/ mmacmane/homo/SRR001665_2.fastq /work/01301/mmacmane/homo/ SRR002271_1.fastq /work/01301/mmacmane/homo/SRR002271_2.fastq /work/ 01301/mmacmane/homo/SRR002273_1.fastq /work/01301/mmacmane/homo/ SRR002273_2.fastq /work/01301/mmacmane/homo/SRR002274_1.fastq /work/ 01301/mmacmane/homo/SRR002274_2.fastq /work/01301/mmacmane/homo/ SRR002275_1.fastq /work/01301/mmacmane/homo/SRR002275_2.fastq /work/ 01301/mmacmane/homo/SRR002276_1.fastq /work/01301/mmacmane/homo/ SRR002276_2.fastq /work/01301/mmacmane/homo/SRR002291_1.fastq /work/ 01301/mmacmane/homo/SRR002291_2.fastq /work/01301/mmacmane/homo/ SRR002295_1.fastq /work/01301/mmacmane/homo/SRR002295_2.fastq /work/ 01301/mmacmane/homo/SRR002297_1.fastq /work/01301/mmacmane/homo/ SRR002297_2.fastq /work/01301/mmacmane/homo/SRR029337_1.fastq /work/ 01301/mmacmane/homo/SRR029337_2.fastq
This comes from the expansion of the *, do you want to give the expression including the * to your application (in this case the expansion by the `ibrun` must be avoided)?

-- Reuti

...many many lines of this...
[i178-302.ranger.tacc.utexas.edu:28340] [[5795,1],19] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file rml_oob_send.c at line 105 [i178-302.ranger.tacc.utexas.edu:28340] [[5795,1],19] could not get route to [[INVALID],INVALID] [i178-302.ranger.tacc.utexas.edu:28340] [[5795,1],19] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file base/plm_base_proxy.c at line 85 [i176-303.ranger.tacc.utexas.edu:05045] [[5795,1],1] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file rml_oob_send.c at line 105 [i176-303.ranger.tacc.utexas.edu:05045] [[5795,1],1] could not get route to [[INVALID],INVALID] [i176-303.ranger.tacc.utexas.edu:05045] [[5795,1],1] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file base/plm_base_proxy.c at line 85 [i178-302.ranger.tacc.utexas.edu:28325] [[5795,1],18] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file rml_oob_send.c at line 105 [i178-302.ranger.tacc.utexas.edu:28325] [[5795,1],18] could not get route to [[INVALID],INVALID]

...many many lines of this...
 TACC: Cleaning up after job: 1299149
TACC: Done.
The TACC systems administrators don't seem to have a great solution, and the authors of the program say its MPI-related...

_________________________________
Matthew MacManes
PhD Candidate
University of California- Berkeley
Museum of Vertebrate Zoology
Phone: 510-495-5833
Lab Website: http://ib.berkeley.edu/labs/lacey
Personal Website: http://macmanes.com/
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to