Re: [OMPI users] All nodes which are allocated for this job are already filled.

2020-04-29 Thread carlos aguni via users
Other debug. I saw this and tried with: rmaps_base_oversubscribe = 1 rmaps_base_inherit = 1 written in $prefix/etc/openmpi-mca-params.conf of all nodes with no luck. I think my problem is somehow related to this https://github.com/open-mpi/ompi/pull/

Re: [OMPI users] All nodes which are allocated for this job are already filled.

2020-04-29 Thread Martín via users
Hi Carlos, could you try ompi 4.0.1?Regards.MartínEl 29 abr. 2020 02:20, carlos aguni via users escribió:Hi all,I'm trying to MPI_Spawn processes with no success.I'm facing the following error:=All nodes which are allocated for this job are already filled.==I'm sett

Re: [OMPI users] All nodes which are allocated for this job are already filled.

2020-04-29 Thread carlos aguni via users
Hi all, Thank you for your reply @Martin. Just tested with 4.0.1. It's all working now. It was actually a typo in the code unbelievable.. Can confirm it's working fine with: openmpi 4.0.1 openmpi 3.0.0 openmpi3/3.1.4 gnu8/8.3.0 ohpc Thank you all. Regards, Carlos. On Wed, Apr 29, 2020 at

[OMPI users] kernel trap - divide by zero

2020-04-29 Thread Rob Scott (roscott2) via users
We are seeing a kernel trap in Hwloc being reported from a few customers. In one particular case, here are details. hwloc-1.10.1 Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz The offending code is is in look_proc() due to cupid function 0x1 returning 4 logical processors or possibly hwloc_flsl() ma

Re: [OMPI users] kernel trap - divide by zero

2020-04-29 Thread Brice Goglin via users
Hello Both 1.10.1 and 1.11.10 are vry old. Any chance you try at least 1.11.13 or even 2.x on these machines? I can't remember all what we changed in this code 5 years later unfortunately. We are not aware of any issue of Intel haswell but it's not impossible something is buggy in the hardwar