That might well be a good idea (create an MCA param for the number of send / 
receive CQEs).  

It certainly seems that OMPI shouldn't be scaling *any* IB resource based on 
the number of peer processes without at least some kind of upper bound.

Perhaps an IB vendor should reply here...



On Dec 31, 2010, at 8:31 AM, Gilbert Grosdidier wrote:

> Bonjour,
> 
>  Back to this painful issue, partly because  I found a workaround,
> and partly because I would like to help.
> 
>  The initial post was : 
> http://www.open-mpi.org/community/lists/users/2010/11/14843.php
> where I reported about OMPI 1.4.1, but it was the same for 1.4.3.
> 
>  I spotted the culprit to be line #274 into btl_openib.c where it was 
> required to replace
> mca_btl_openib_component.qp_infos[qp].u.pp_qp.rd_rsv) * nprocs;
> with
> mca_btl_openib_component.qp_infos[qp].u.pp_qp.rd_rsv) * 32;
> mostly because nprocs = 4096 or 8192 in our case, which was leading to a
> very huge memlock resource requirement.
> 
>  Since I don't believe there is a relevant mca parameter to control this 
> value accurately
> (am I wrong ?), I would suggest to invent such switch.
> 
>  It occurs to work because the number of peers for a given node (apart for 
> rank 0) is very low,
> but it is definitely useful when all-to-all communication is not required on 
> a big cluster.
> 
>  Could someone comment on this ?
> 
>  More info on request.
> 
>  Thanks,      Happy New Year to you all,       G.
> 
> 
> 
> Le 29/11/2010 16:58, Gilbert Grosdidier a écrit :
>> Bonjour John,
>> 
>>  Thanks for your feedback, but my investigations so far did not help:
>> the memlock limit on the compute nodes are actually set to unlimited.
>> This most probably means that even if the btl_openib hits some memory 
>> allocation
>> limit, the message I got is inaccurate because the memlock resource is 
>> indeed already unlimited.
>> 
>>  Then, the btl allocation mechanism seems to be stopped 
>> by the memlock resource being exhausted because the former is
>> attempting to create too many buffers, for example. I tried to explore this
>> kind of assumption by decreasing :
>> - btl_ofud_rd_num down to 32 or even 16
>> - btl_openib_cq_size down to 256 or even 64
>> but to no avail.
>> 
>>  So, I am asking for help about which other parameter could lead to (locked 
>> ?) memory exhaustion,
>> knowing that the current memlock wall shows up 
>> - when I run with 4096 or 8192 cores (for 2048, that's fine)
>> - there are 4GB of RAM available for each core
>> - each core is communicating with no more than 8 neighbours, and they
>> stay the same along the whole job life.
>> 
>>  Does this triggers some idea for anyone ?
>> 
>> 
>>  Thanks in advance,           Best,    Gilbert.
>> 
>> 
>> Le 20 nov. 10 à 19:27, John Hearns a écrit :
>> 
>>> On 20 November 2010 16:31, Gilbert Grosdidier 
>>>> Bonjour,
>>> 
>>> Bonjour Gilbert.
>>> 
>>> I manage ICE clusters also.
>>> 
>>> Please could you have look at /etc/init.d/pbs on the compute blades?
>>> 
>>> 
>>> 
>>> Do you have something like:
>>> 
>>>    if [ "${PBS_START_MOM}" -gt 0 ] ; then
>>>      if check_prog "mom" ; then
>>>        echo "PBS mom already running."
>>>      else
>>>        check_maxsys
>>>        site_mom_startup
>>>        if [ -f /etc/sgi-release -o -f /etc/sgi-compute-node-release ] ; then
>>>            MEMLOCKLIM=`ulimit -l`
>>>            NOFILESLIM=`ulimit -n`
>>>            STACKLIM=`ulimit -s`
>>>            ulimit -l unlimited
>>>            ulimit -n 16384
>>>            ulimit -s unlimited
>>>        fi
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> 
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> -- 
>  Cordialement,   Gilbert.
> 
> --
> *---------------------------------------------------------------------*
>   Gilbert Grosdidier             
> gilbert.grosdid...@in2p3.fr
> 
>   LAL / IN2P3 / CNRS                 Phone : +33 1 6446 8909
>   Faculté des Sciences, Bat. 200     Fax   : +33 1 6446 8546
>   B.P. 34, F-91898 Orsay Cedex (FRANCE)
> *---------------------------------------------------------------------*
> 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to