Re: [OMPI users] Ompi runs thru cmd line but fails when run thru SGE

Reuti Sat, 24 Jan 2009 15:52:13 -0500

Am 24.01.2009 um 17:12 schrieb Jeremy Stout:

The RLIMIT error is very common when using OpenMPI + OFED + Sun Grid
Engine. You can find more information and several remedies here:
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages


I usually resolve this problem by adding "ulimit -l unlimited" near
the top of the SGE startup script on the computation nodes and
restarting SGE on every node.

Did you request/set any limits with SGE's h_vmem/h_stack resourcerequest?


-- Reuti

Jeremy Stout
On Sat, Jan 24, 2009 at 6:06 AM, Sangamesh B <forum....@gmail.com>wrote:
Hello all,

 Open MPI 1.3 is installed on Rocks 4.3 Linux cluster with support of
SGE i.e using --with-sge.
But the ompi_info shows only one component:
# /opt/mpi/openmpi/1.3/intel/bin/ompi_info | grep gridengine
MCA ras: gridengine (MCA v2.0, API v2.0, Componentv1.3)
Is this right? Because during ompi installation SGE qmaster daemonwas
not working.

Now the problem is, the open mpi parallel jobs submitted thru
gridengine are failing (when run on multiple nodes) with the error:

$ cat err.26.Helloworld-PRL
ssh_exchange_identification: Connection closed by remote host
--------------------------------------------------------------------------A daemon (pid 8462) died unexpectedly with status 129 whileattempting
to launch so we are aborting.
There may be more information reported by the environment (seeabove).
This may be because the daemon was unable to find all the neededsharedlibraries on the remote node. You may set your LD_LIBRARY_PATH tohave the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
----------------------------------------------------------------------------------------------------------------------------------------------------mpirun noticed that the job aborted, but has no info as to theprocess
that caused that situation.
--------------------------------------------------------------------------
mpirun: clean termination accomplished

When the job runs on single node, it runs well with producing the
output but with an error:
$ cat err.23.Helloworld-PRL
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
   This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
   This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
   This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
   This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
   This will severely limit memory registrations.
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.

 Local host:   node-0-4.local
 Local device: mthca0
--------------------------------------------------------------------------
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
   This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
   This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
   This will severely limit memory registrations.
[node-0-4.local:07869] 7 more processes have sent help message
help-mpi-btl-openib.txt / error in device init
[node-0-4.local:07869] Set MCA parameter"orte_base_help_aggregate" to
0 to see all help / error messages

What may be the problem for this behavior?

Thanks,
Sangamesh
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Ompi runs thru cmd line but fails when run thru SGE

Reply via email to