That might well be a good idea (create an MCA param for the number of send /
receive CQEs).
It certainly seems that OMPI shouldn't be scaling *any* IB resource based on
the number of peer processes without at least some kind of upper bound.
Perhaps an IB vendor should reply here...
On Dec
Bonjour,
Back to this painful issue, partly because I found a workaround,
and partly because I would like to help.
The initial post was :
http://www.open-mpi.org/community/lists/users/2010/11/14843.php
where I reported about OMPI 1.4.1, but it was the same for 1.4.3.
I spotted the culprit
Bonjour John,
Thanks for your feedback, but my investigations so far did not help:
the memlock limit on the compute nodes are actually set to unlimited.
This most probably means that even if the btl_openib hits some memory
allocation
limit, the message I got is inaccurate because the memlock re
Bonjour John,
Thanks for your feedback, but my investigations so far did not help:
the memlock limit on the compute nodes are actually set to unlimited.
This most probably means that even if the btl_openib hits some memory allocation
limit, the message I got is inaccurate because the memlock reso
On 20 November 2010 16:31, Gilbert Grosdidier wrote:
> Bonjour,
Bonjour Gilbert.
I manage ICE clusters also.
Please could you have look at /etc/init.d/pbs on the compute blades?
Do you have something like:
if [ "${PBS_START_MOM}" -gt 0 ] ; then
if check_prog "mom" ; then
e
Bonjour,
I am afraid I got a weird issue when running an OpenMPI job using OpenIB
on an SGI ICE cluster with 4096 cores (or larger), and the FAQ does not help.
The OMPI version is 1.4.1, and it is running just fine with a smaller number of
cores (up to 512).
The error message is the following