On Wed, Aug 22, 2007 at 03:31:20PM +0300, Noam Meltzer wrote: > Hi, > > I am running openmpi-1.2.3 compiled for 64bit on RHEL4u4. > I also have a Voltaire InfiniBand interconnect. > When I manually run jobs using the following command: > > /opt/local/openmpi-1.2.3-gcc4/bin/orterun -np 8 -hostfile ~/myHostList > -mca btl self,openib /tcc/eandm/performance/igor/main.exe.openmpi123 > > The job is executed just fine.. > > Though, when run through SGE I have the weirdest problem, and get the > following error (on all hosts in my list): > -------------------------------------------------------------------------- > The OpenIB BTL failed to initialize while trying to create an internal > queue. This typically indicates a failed OpenFabrics installation or > faulty hardware. The failure occured here: > > Host: node4.grid.technion.ac.il > OMPI source: btl_openib.c:828 > Function: ibv_create_cq() > Error: Invalid argument (errno=22) > Device: mthca0 > > You may need to consult with your system administrator to get this > problem fixed. > -------------------------------------------------------------------------- > > To send a job to the grid I use the following command: > qrsh -cwd -q noam.q -pe orte 8 ./myScript > > while "myScript" looks like: > > #!/bin/bash > /opt/local/openmpi-1.2.3-gcc4/bin/orterun -np $NSLOTS -mca btl > self,openib /tcc/eandm/performance/igor/main.exe.openmpi123 > > If I change "openib" to "tcp" (in myScript) everything works just fine. > > Any ideas? > Perhaps SGE doesn't set locked memory limit properly.
-- Gleb.