Daryl,

Try this:


-------- Original Message --------
Subject: RE: only root running mpi jobs with 1.0.1rc5
List-Post: users@lists.open-mpi.org
Date: Thu, 01 Dec 2005 18:49:46 -0700
From: Joshua Aune <lu...@lnxi.com>
Reply-To: lu...@lnxi.com
Organization: Linux Networx
To: Todd Wilde <t...@mellanox.com>
CC: Matthew Finlay <m...@mellanox.com>, twood...@lanl.gov,        Robert Cummins 
<rcumm...@lnxi.com>, Pat Lindsay <plind...@lnxi.com>
References: <25AE7F432672D511B8DC00B0D0DF11DA05FC26CB@MTIEX01>

Sounds like you were right

*               soft    memlock         8388608 # 8 GB
*               hard    memlock         8388608 # 8 GB


and now I get no errors :)  Looks like the limits were propigated to the
back end nodes.

Tim, this should fix your problem as well?

On Thu, 2005-12-01 at 17:26 -0800, Todd Wilde wrote:
> How about this one:
>
> For Redhat AS4.0 and Fedora Core 3 or a newer kernel, edit the
> file /etc/security/limits.conf and add the following two lines:
>
> soft memlock <number>
>
> hard memlock <number>
>
> The <number> value denotes the number of kilobytes that may be locked
> by a process.
>
> > -----Original Message-----
> > From: Joshua Aune [mailto:lu...@lnxi.com]
> > Sent: Thursday, December 01, 2005 3:50 PM
> > To: Todd Wilde
> > Cc: Matthew Finlay; twood...@lanl.gov; Robert Cummins; Pat Lindsay
> > Subject: RE: only root running mpi jobs with 1.0.1rc5
> >
> > On Thu, 2005-12-01 at 15:39 -0800, Todd Wilde wrote:
> > > It may be a permissions issue with normal users locking memory.
> I've
> > > seen this in the past.  Try adding the following command at boot:
> > >
> > >
> > > sysctl -w vm.disable_cap_mlock=1
> >
> > This doesn't exist in 2.6.14...
> >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Joshua Aune [mailto:lu...@lnxi.com]
> > > > Sent: Thursday, December 01, 2005 1:56 PM
> > > > To: Matthew Finlay; Todd Wilde; twood...@lanl.gov
> > > > Cc: Robert Cummins; Pat Lindsay
> > > > Subject: only root running mpi jobs with 1.0.1rc5
> > > >
> > > > Root runs jobs fine but users don't.
> > > >
> > > > Any thoughts?
> > > >
> > > > Thanks,
> > > > josh
> > > >
> > > > coyote2-compute# module purge
> > > > coyote2-compute# module load compiler/gcc mpi/openmpi-1.0.1rc5
> > > > coyote2-compute# cd /home/luken/hello
> > > > coyote2-compute# mpirun -np 2 -H 201,202 mpi_hello
> > > > n201: I am rank 0
> > > > n202: I am rank 1
> > > >
> > > >
> > > > coyote2-compute$ su - luken
> > > > coyote2-compute$ module purge
> > > > coyote2-compute$ module load compiler/gcc mpi/openmpi-1.0.1rc5
> > > > coyote2-compute$ cd /home/luken/hello
> > > > coyote2-compute$ mpirun -np 2 -H 201,202 mpi_hello
> > > > [0,1,0][btl_openib.c:803:mca_btl_openib_module_init] error
> creating
> > > high
> > > > priority cq for mthca0 errno says Cannot allocate memory
> > > > [0,1,1][btl_openib.c:803:mca_btl_openib_module_init] error
> creating
> > > high
> > > > priority cq for mthca0 errno says Cannot allocate memory
> > > >
> > > > n201: I am rank 0
> > > >
> > > > n202: I am rank 1
> > >
>




Daryl W. Grunau wrote:
Hi, I'm running OMPI 1.1a1r8378 on 2.6.14 + recent OpenIB stack and getting
the following runtime error:

[0,1,0][btl_openib.c:803:mca_btl_openib_module_init] error creating high 
priority cq for mthca0 errno says Cannot allocate memory
[0,1,3][btl_openib.c:803:mca_btl_openib_module_init] error creating high 
priority cq for mthca0 errno says Cannot allocate memory
[0,1,1][btl_openib.c:803:mca_btl_openib_module_init] error creating high 
priority cq for mthca0 errno says Cannot allocate memory
[0,1,2][btl_openib.c:803:mca_btl_openib_module_init] error creating high 
priority cq for mthca0 errno says Cannot allocate memory


Strange thing is that it works properly when I run as root.  A permissions
problem on my part?  My devices look like:

   # ls -l /dev/infiniband/*
   crw-------  1 root root 231,  64 Dec  5 17:16 /dev/infiniband/issm0
   crw-------  1 root root 231,  65 Dec  5 17:16 /dev/infiniband/issm1
   crw-------  1 root root 231,   0 Dec  5 17:16 /dev/infiniband/umad0
   crw-------  1 root root 231,   1 Dec  5 17:16 /dev/infiniband/umad1
   crw-rw-rw-  1 root root 231, 192 Dec  5 17:16 /dev/infiniband/uverbs0

Daryl

Reply via email to