That might well be a good idea (create an MCA param for the number of send / receive CQEs).
It certainly seems that OMPI shouldn't be scaling *any* IB resource based on the number of peer processes without at least some kind of upper bound. Perhaps an IB vendor should reply here... On Dec 31, 2010, at 8:31 AM, Gilbert Grosdidier wrote: > Bonjour, > > Back to this painful issue, partly because I found a workaround, > and partly because I would like to help. > > The initial post was : > http://www.open-mpi.org/community/lists/users/2010/11/14843.php > where I reported about OMPI 1.4.1, but it was the same for 1.4.3. > > I spotted the culprit to be line #274 into btl_openib.c where it was > required to replace > mca_btl_openib_component.qp_infos[qp].u.pp_qp.rd_rsv) * nprocs; > with > mca_btl_openib_component.qp_infos[qp].u.pp_qp.rd_rsv) * 32; > mostly because nprocs = 4096 or 8192 in our case, which was leading to a > very huge memlock resource requirement. > > Since I don't believe there is a relevant mca parameter to control this > value accurately > (am I wrong ?), I would suggest to invent such switch. > > It occurs to work because the number of peers for a given node (apart for > rank 0) is very low, > but it is definitely useful when all-to-all communication is not required on > a big cluster. > > Could someone comment on this ? > > More info on request. > > Thanks, Happy New Year to you all, G. > > > > Le 29/11/2010 16:58, Gilbert Grosdidier a écrit : >> Bonjour John, >> >> Thanks for your feedback, but my investigations so far did not help: >> the memlock limit on the compute nodes are actually set to unlimited. >> This most probably means that even if the btl_openib hits some memory >> allocation >> limit, the message I got is inaccurate because the memlock resource is >> indeed already unlimited. >> >> Then, the btl allocation mechanism seems to be stopped >> by the memlock resource being exhausted because the former is >> attempting to create too many buffers, for example. I tried to explore this >> kind of assumption by decreasing : >> - btl_ofud_rd_num down to 32 or even 16 >> - btl_openib_cq_size down to 256 or even 64 >> but to no avail. >> >> So, I am asking for help about which other parameter could lead to (locked >> ?) memory exhaustion, >> knowing that the current memlock wall shows up >> - when I run with 4096 or 8192 cores (for 2048, that's fine) >> - there are 4GB of RAM available for each core >> - each core is communicating with no more than 8 neighbours, and they >> stay the same along the whole job life. >> >> Does this triggers some idea for anyone ? >> >> >> Thanks in advance, Best, Gilbert. >> >> >> Le 20 nov. 10 à 19:27, John Hearns a écrit : >> >>> On 20 November 2010 16:31, Gilbert Grosdidier >>>> Bonjour, >>> >>> Bonjour Gilbert. >>> >>> I manage ICE clusters also. >>> >>> Please could you have look at /etc/init.d/pbs on the compute blades? >>> >>> >>> >>> Do you have something like: >>> >>> if [ "${PBS_START_MOM}" -gt 0 ] ; then >>> if check_prog "mom" ; then >>> echo "PBS mom already running." >>> else >>> check_maxsys >>> site_mom_startup >>> if [ -f /etc/sgi-release -o -f /etc/sgi-compute-node-release ] ; then >>> MEMLOCKLIM=`ulimit -l` >>> NOFILESLIM=`ulimit -n` >>> STACKLIM=`ulimit -s` >>> ulimit -l unlimited >>> ulimit -n 16384 >>> ulimit -s unlimited >>> fi >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > -- > Cordialement, Gilbert. > > -- > *---------------------------------------------------------------------* > Gilbert Grosdidier > gilbert.grosdid...@in2p3.fr > > LAL / IN2P3 / CNRS Phone : +33 1 6446 8909 > Faculté des Sciences, Bat. 200 Fax : +33 1 6446 8546 > B.P. 34, F-91898 Orsay Cedex (FRANCE) > *---------------------------------------------------------------------* > -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/