Re: [OMPI users] Trouble with Memlock when using OpenIB on an SGI ICE Cluster

2011-01-06 Thread Jeff Squyres
That might well be a good idea (create an MCA param for the number of send / receive CQEs). It certainly seems that OMPI shouldn't be scaling *any* IB resource based on the number of peer processes without at least some kind of upper bound. Perhaps an IB vendor should reply here... On Dec

Re: [OMPI users] Trouble with Memlock when using OpenIB on an SGI ICE Cluster

2010-12-31 Thread Gilbert Grosdidier
Bonjour, Back to this painful issue, partly because I found a workaround, and partly because I would like to help. The initial post was : http://www.open-mpi.org/community/lists/users/2010/11/14843.php where I reported about OMPI 1.4.1, but it was the same for 1.4.3. I spotted the culprit

Re: [OMPI users] Trouble with Memlock when using OpenIB on an SGI ICE Cluster

2010-11-29 Thread Gilbert Grosdidier
Bonjour John, Thanks for your feedback, but my investigations so far did not help: the memlock limit on the compute nodes are actually set to unlimited. This most probably means that even if the btl_openib hits some memory allocation limit, the message I got is inaccurate because the memlock re

Re: [OMPI users] Trouble with Memlock when using OpenIB on an SGI ICE Cluster

2010-11-25 Thread Gilbert Grosdidier
Bonjour John,  Thanks for your feedback, but my investigations so far did not help: the memlock limit on the compute nodes are actually set to unlimited. This most probably means that even if the btl_openib hits some memory allocation limit, the message I got is inaccurate because the memlock reso

Re: [OMPI users] Trouble with Memlock when using OpenIB on an SGI ICE Cluster (fwd)

2010-11-20 Thread John Hearns
On 20 November 2010 16:31, Gilbert Grosdidier wrote: > Bonjour, Bonjour Gilbert. I manage ICE clusters also. Please could you have look at /etc/init.d/pbs on the compute blades? Do you have something like: if [ "${PBS_START_MOM}" -gt 0 ] ; then if check_prog "mom" ; then e

[OMPI users] Trouble with Memlock when using OpenIB on an SGI ICE Cluster (fwd)

2010-11-20 Thread Gilbert Grosdidier
Bonjour, I am afraid I got a weird issue when running an OpenMPI job using OpenIB on an SGI ICE cluster with 4096 cores (or larger), and the FAQ does not help. The OMPI version is 1.4.1, and it is running just fine with a smaller number of cores (up to 512). The error message is the following