Hi - I’m trying to get OpenMPI working on a newly configured CentOS 7 system,
and I’m not even sure what information would be useful to provide. I’m using
the CentOS built in libibverbs and/or libfabric, and I configure openmpi with
just
—with-verbs —with-ofi —prefix=$DEST
also tried —without-ofi, no change. Basically, I can run with “—mca btl
self,vader”, but if I try “—mca btl,openib” I get an error from each process:
[compute-0-0][[24658,1],5][connect/btl_openib_connect_udcm.c:1245:udcm_rc_qp_to_rtr]
error modifing QP to RTR errno says Invalid argument
If I don’t specify the btl it appears to try to set up openib with the same
errors, then crashes on some free() related segfault, presumably when it tries
to actually use vader.
The machine seems to be able to see its IB interface, as reported by things
like ibstatus or ibv_devinfo. I’m not sure what else to look for. I also
confirmed that “ulimit -l” reports unlimited.
Does anyone have any suggestions as to how to diagnose this issue?
thanks,
Noam
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users