Re: [OMPI users] Process binding with SLURM and 'heterogeneous' nodes

2015-10-05 Thread Gilles Gouaillardet
OK, i'll see what i can do :-) On 10/5/2015 12:39 PM, Ralph Castain wrote: I would consider that a bug, myself - if there is some resource available, we should use it On Oct 4, 2015, at 5:42 PM, Gilles Gouaillardet > wrote: Marcin, i ran a simple test with v1.10.1

Re: [OMPI users] Process binding with SLURM and 'heterogeneous' nodes

2015-10-05 Thread Gilles Gouaillardet
Ralph and Marcin, here is a proof of concept for a fix (assert should be replaced with proper error handling) for v1.10 branch. if you have any chance to test it, please let me know the results Cheers, Gilles On 10/5/2015 1:08 PM, Gilles Gouaillardet wrote: OK, i'll see what i can do :-) O

Re: [OMPI users] Process binding with SLURM and 'heterogeneous' nodes

2015-10-05 Thread Ralph Castain
I think this is okay, in general. I would only make one change: I would only search for an alternative site if the binding policy wasn’t set by the user. If the user specifies a mapping/binding pattern, then we should error out as we cannot meet it. I did think of one alternative that might be

Re: [OMPI users] Process binding with SLURM and 'heterogeneous' nodes

2015-10-05 Thread marcin.krotkiewski
I have applied the patch to both 1.10.0 and 1.10.1rc1. For 1.10.0 it did not help - I am not sure how much (if) you want pursue this. For 1.10.1rc1 I was so far unable to reproduce any binding problems with jobs of up to 128 tasks. Some cosmetic suggestions. The warning it all started with s

Re: [OMPI users] Process binding with SLURM and 'heterogeneous' nodes

2015-10-05 Thread Gilles Gouaillardet
Marcin, there is no need to pursue 1.10.0 since it is known to be broken for some scenario. it would really help me if you could provide the logs I requested, so I can reproduce the issue and make sure we both talk about the same scenario. imho, there is no legitimate reason to -map-by hwthread

Re: [OMPI users] Process binding with SLURM and 'heterogeneous' nodes

2015-10-05 Thread marcin.krotkiewski
Hi, Gilles you mentionned you had one failure with 1.10.1rc1 and -bind-to core could you please send the full details (script, allocation and output) in your slurm script, you can do srun -N $SLURM_NNODES -n $SLURM_NNODES --cpu_bind=none -l grep Cpus_allowed_list /proc/self/status before invoki

Re: [OMPI users] Process binding with SLURM and 'heterogeneous' nodes

2015-10-05 Thread Ralph Castain
Thanks Marcin. I think we have three things we need to address: 1. the warning needs to be emitted regardless of whether or not —report-bindings was given. Not sure how that warning got “covered” by the option, but it is clearly a bug 2. improve the warning to include binding info - relatively

Re: [OMPI users] worse latency in 1.8 c.f. 1.6

2015-10-05 Thread Dave Love
Mike Dubman writes: > what is your command line and setup? (ofed version, distro) > > This is what was just measured w/ fdr on haswell with v1.8.8 and mxm and UD > > + mpirun -np 2 -bind-to core -display-map -mca rmaps_base_mapping_policy > dist:span -x MXM_RDMA_PORTS=mlx5_3:1 -mca rmaps_dist_dev

Re: [OMPI users] Using OpenMPI (1.8, 1.10) with Mellanox MXM, ulimits ?

2015-10-05 Thread Dave Love
Mike Dubman writes: > right, it is not attribute of mxm, but general effect. Thanks. That's the sort of thing we can investigate, but then the messages from MXM are very misleading. > and you are right again - performance engineering will always be needed for > best performance in some cases.

[OMPI users] Hybrid OpenMPI+OpenMP tasks using SLURM

2015-10-05 Thread marcin.krotkiewski
Yet another question about cpu binding under SLURM environment.. Short version: will OpenMPI support SLURM_CPUS_PER_TASK for the purpose of cpu binding? Full version: When you allocate a job like, e.g., this salloc --ntasks=2 --cpus-per-task=4 SLURM will allocate 8 cores in total, 4 for eac

Re: [OMPI users] Hybrid OpenMPI+OpenMP tasks using SLURM

2015-10-05 Thread Ralph Castain
You would presently do: mpirun —map-by core:pe=4 to get what you are seeking. If we don’t already set that qualifier when we see “cpus_per_task”, then we probably should do so as there isn’t any reason to make you set it twice (well, other than trying to track which envar slurm is using now).

Re: [OMPI users] Hybrid OpenMPI+OpenMP tasks using SLURM

2015-10-05 Thread marcin.krotkiewski
Ralph, Thank you for a fast response! Sounds very good, unfortunately I get an error: $ mpirun --map-by core:pe=4 ./affinity -- A request for multiple cpus-per-proc was given, but a directive was also give to map to an obj

Re: [OMPI users] Hybrid OpenMPI+OpenMP tasks using SLURM

2015-10-05 Thread Ralph Castain
Hmmm…okay, try -map-by socket:pe=4 We’ll still hit the asymmetric topology issue, but otherwise this should work > On Oct 5, 2015, at 1:25 PM, marcin.krotkiewski > wrote: > > Ralph, > > Thank you for a fast response! Sounds very good, unfortunately I get an error: > > $ mpirun --map-by core

Re: [OMPI users] Process binding with SLURM and 'heterogeneous' nodes

2015-10-05 Thread Jeff Squyres (jsquyres)
I filed an issue to track this problem here: https://github.com/open-mpi/ompi/issues/978 > On Oct 5, 2015, at 1:01 PM, Ralph Castain wrote: > > Thanks Marcin. I think we have three things we need to address: > > 1. the warning needs to be emitted regardless of whether or not > —report-bi

Re: [OMPI users] [Open MPI Announce] Open MPI v1.10.1rc1 release

2015-10-05 Thread Jeff Squyres (jsquyres)
On Oct 3, 2015, at 9:14 AM, Dimitar Pashov wrote: > > Hi, I have a pet bug causing silent data corruption here: >https://github.com/open-mpi/ompi/issues/965 > which seems to have a fix committed some time later. I've tested v1.10.1rc1 > now and it still has the issue. I hope the fix makes i

Re: [OMPI users] Hybrid OpenMPI+OpenMP tasks using SLURM

2015-10-05 Thread tmishima
Hi Ralph, it's been a long time. The option "map-by core" does not work when pe=N > 1 is specified. So, you should use "map-by slot:pe=N" as far as I remember. Regards, Tetsuya Mishima 2015/10/06 5:40:33、"users"さんは「Re: [OMPI users] Hybrid OpenMPI+OpenMP tasks using SLURM」で書きました > Hmmm…okay, try

Re: [OMPI users] Hybrid OpenMPI+OpenMP tasks using SLURM

2015-10-05 Thread Ralph Castain
Ah, yes - thanks! It’s been so long since I played with that option I honestly forgot :-) Hope you are doing well ! Ralph > On Oct 5, 2015, at 4:04 PM, tmish...@jcity.maeda.co.jp wrote: > > Hi Ralph, it's been a long time. > > The option "map-by core" does not work when pe=N > 1 is specified.

Re: [OMPI users] Hybrid OpenMPI+OpenMP tasks using SLURM

2015-10-05 Thread tmishima
I'm doing quite well, thank you. I'm involved in a big project and so very busy now. But I still try to keep watching these mailing lists. Regards, Tetsuya Mishima 2015/10/06 8:17:33、"users"さんは「Re: [OMPI users] Hybrid OpenMPI+OpenMP tasks using SLURM」で書きました > Ah, yes - thanks! It’s been so long