I've got a system that is running Open MPI 1.1.2, SLES 10, with the OFED
1.0 drivers.
The code runs over Gigabit Ethernet/TCP without issues on Open MPI...
The code does some memory allocation, communication, etc - the developer
wrote it to stress the network fabric, and can be submitted if necessary.
The job is being run on four nodes (two dual-core CPUs each, 16 cores
total), with DDR IB.
Non-MPI bandwidth tests don't seem to be having issues; but that doesn't
necessarily mean things work great over MPI.
The error is (upon job start), something to the extent of (transcribed
from phone):
mca_mpool_openib_register: cannot allocate memory
.
.
.
Error creating low priority CQ for MTHCA0: Cannot allocate memory.
What has to happen for this message to get thrown? (I've seen IB fabric
instability with OpenIB before, and I don't recall this being one of the
errors I've seen).
Also, is there any chance that the error can be caused by mismatched
libraries (from a different compile of Open MPI?)
(And I apologize for firing off this without knowing more; I'm still
gathering data as I learn more...)
--
Troy Telford