Are you able to run non-MPI programs like "hostname"?

I ask because that error message indicates that everything started just fine, 
but there is an error in your application.


On Apr 6, 2011, at 6:01 PM, Jason Palmer wrote:

> Btw, I did compile openmpi with the --with-sge flag.
> 
> I am able to compile a test program using openf90 with no errors or
> warnings. But when I try to run a test program that just calls
> MPI_INIT(ierr), then MPI_COMM_RANK(ierr), I get the following, whether
> static or linked, and whether run with mpirun or directly:
> 
> [juggling.ucsd.edu:20218] *** An error occurred in MPI_Comm_rank
> [juggling.ucsd.edu:20218] *** on communicator MPI_COMM_WORLD
> [juggling.ucsd.edu:20218] *** MPI_ERR_COMM: invalid communicator
> [juggling.ucsd.edu:20218] *** MPI_ERRORS_ARE_FATAL (your MPI job will now
> abort)
> 
> Is there something  missing in the linux or parallel environment settings?
> Thanks.
> 
> -----Original Message-----
> From: Jason Palmer [mailto:japalme...@gmail.com] 
> Sent: Wednesday, April 06, 2011 4:09 PM
> To: 'Open MPI Users'
> Subject: SGE and openmpi
> 
> Hi,
> I am having trouble running a batch job in SGE using openmpi.  I have read
> the faq, which says that openmpi will automatically do the right thing, but
> something seems to be wrong.
> 
> Previously I used MPICH1 under SGE without any problems. I'm avoiding MPICH2
> because it doesn't seem to support static compilation, whereas I was able to
> get openmpi to compile with open64 and compile my program statically.
> 
> But I am having problems launching. According to the documentation, I should
> be able to have a script file, qsub.sh:
> 
> #!/bin/bash
> #$ -cwd
> #$ -j y
> #$ -S /bin/bash
> #$ -q all.q
> #$ -pe orte 18
> MPI_DIR=/home/jason/openmpi-1.4.3-install/bin
> /home/jason/openmpi-1.4.3-install/bin/mpirun -np $NSLOTS  myprog
> 
> Then,
>       $ qsub  qsub.sh
> 
> Previously with MPICH1 I would have
> 
>       -machinefile $TMP/machines
> 
> in the mpirun arguments, and the rest of the script the same except -pe
> mpich 18, and it would work. The -machinefile argument doesn't seem to work
> in orte. The error in qsub.sh.o is:
> 
> [jason@juggling ~/amica_open64]$ cat qsub.sh.o7514 [compute-0-0.local:17792]
> *** An error occurred in MPI_Comm_rank [compute-0-0.local:17792] *** on
> communicator MPI_COMM_WORLD [compute-0-0.local:17792] *** MPI_ERR_COMM:
> invalid communicator [compute-0-0.local:17792] *** MPI_ERRORS_ARE_FATAL
> (your MPI job will now abort)
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 0 with PID 17792 on node
> compute-0-0.local exiting without calling "finalize". This may have caused
> other processes in the application to be terminated by signals sent by
> mpirun (as reported here).
> --------------------------------------------------------------------------
> [compute-0-0.local:17788] 8 more processes have sent help message
> help-mpi-errors.txt / mpi_errors_are_fatal [compute-0-0.local:17788] Set MCA
> parameter "orte_base_help_aggregate" to 0 to see all help / error messages
> 
> 
> I ran qconf, and I get the same output as in the documentation:
> 
> [jason@juggling ~/amica_open64]$ qconf -sp orte
> pe_name            orte
> slots              9999
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    /bin/true
> stop_proc_args     /bin/true
> allocation_rule    $fill_up
> control_slaves     TRUE
> job_is_first_task  FALSE
> urgency_slots      min
> accounting_summary TRUE
> 
> The qconf mpich output is:
> 
> [jason@juggling ~/amica_open64]$ qconf -sp mpich
> pe_name            mpich
> slots              9999
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    /opt/gridengine/mpi/startmpi.sh -catch_rsh $pe_hostfile
> stop_proc_args     /opt/gridengine/mpi/stopmpi.sh
> allocation_rule    $fill_up
> control_slaves     TRUE
> job_is_first_task  FALSE
> urgency_slots      min
> accounting_summary TRUE
> 
> with specific scripts for start_proc_args and stop_proc_args ...
> 
> Am I missing something necessary to run openmpi under SGE?
> 
> Thanks very much,
> Jason
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to