[OMPI users] can't open /dev/ipath, network down (err=26)

2020-05-09 Thread Heinz, Michael William via users
Prentice, Avoiding the obvious question of whether your FM is running and the fabric is in an active state, It sounds like your exhausting a resource on the cards. Ralph is correct about support for QLogic cards being long past but I’ll see what I can dig up in the archives on Monday to see if

Re: [OMPI users] can't open /dev/ipath, network down (err=26)

2020-05-09 Thread Patrick Bégou via users
Le 08/05/2020 à 21:56, Prentice Bisbal via users a écrit : > > We often get the following errors when more than one job runs on the > same compute node. We are using Slurm with OpenMPI. The IB cards are > QLogic using PSM: > > 10698ipath_userinit: assign_context command failed: Network is down > no

Re: [OMPI users] can't open /dev/ipath, network down (err=26)

2020-05-09 Thread Heinz, Michael William via users
That it! I was trying to remember what the setting was but I haven’t worked on those HCAs since around 2012, so it was faint. That said, I found the Intel TrueScale manual online at https://www.intel.com/content/dam/support/us/en/documents/network-and-i-o/fabric-products/OFED_Host_Software_UserG

Re: [OMPI users] can't open /dev/ipath, network down (err=26)

2020-05-09 Thread Patrick Bégou via users
This material is working for nearly 10 years with several generations of nodes and OpenMPI without any problem. Today it is possible to found refurbished parts at low price on the web and it can help building small clusters. it is really more efficient than 10Gb ethernet for parallel codes due to t

[OMPI users] Memchecker and MPI_Comm_spawn

2020-05-09 Thread Mccall, Kurt E. (MSFC-EV41) via users
How can I run OpenMPI's Memchecker on a process created by MPI_Comm_spawn()? I've configured OpenMPI 4.0.3 for Memchecker, along with Valgrind 3.15.0 and it works quite well on processes created directly by mpiexec. I tried to do something analogous by pre-pending "valgrind" onto the command

Re: [OMPI users] Memchecker and MPI_Comm_spawn

2020-05-09 Thread Gilles Gouaillardet via users
Kurt, the error is "valgrind myApp" is not an executable (but this is a command a shell can interpret) so you have several options: - use a wrapper (e.g. myApp.valgrind) that forks&exec valgrind myApp) - MPI_Comm_spawn("valgrind", argv, ...) after you inserted "myApp" at the beginning of argv -