I've got a system that is running Open MPI 1.1.2, SLES 10, with the OFED 1.0 drivers.

The code runs over Gigabit Ethernet/TCP without issues on Open MPI...

The code does some memory allocation, communication, etc - the developer wrote it to stress the network fabric, and can be submitted if necessary.

The job is being run on four nodes (two dual-core CPUs each, 16 cores total), with DDR IB.

Non-MPI bandwidth tests don't seem to be having issues; but that doesn't necessarily mean things work great over MPI.

The error is (upon job start), something to the extent of (transcribed from phone):
 mca_mpool_openib_register:  cannot allocate memory
  .
  .
  .
 Error creating low priority CQ for MTHCA0:  Cannot allocate memory.

What has to happen for this message to get thrown? (I've seen IB fabric instability with OpenIB before, and I don't recall this being one of the errors I've seen).

Also, is there any chance that the error can be caused by mismatched libraries (from a different compile of Open MPI?)

(And I apologize for firing off this without knowing more; I'm still gathering data as I learn more...)
--
Troy Telford

Reply via email to