Re: [OMPI users] Hybrid MPI+OpenMP benchmarks (looking for)

2017-10-10 Thread Peter Kjellström
HPGMG-FV is easy to build and to run both serial, mpi, openmp and
mpi+openmp.

/Peter

On Mon, 9 Oct 2017 17:54:02 +
"Sasso, John (GE Digital, consultant)"  wrote:

> I am looking for a decent hybrid MPI+OpenMP benchmark utility which I
> can easily build and run with OpenMPI 1.6.5 (at least) and OpenMP
> under Linux, using GCC build of OpenMPI as well as the Intel Compiler
> suite.  I have looked at CP2K but that is much too complex a build
> for its own good (I managed to build all the prerequisite libraries,
> only to have the build of cp2k itself just fail).  Also looked at
> HOMB 1.0.
> 
> I am wondering what others have used.  The build should be simple and
> not require a large # of prereq libraries to build beforehand.
> Thanks!
> 
> --john

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


[OMPI users] alltoallv

2017-10-10 Thread Michael Di Domenico
i'm getting stuck trying to run some fairly large IMB-MPI alltoall
tests under openmpi 2.0.2 on rhel 7.4

i have two different clusters, one running mellanox fdr10 and one
running qlogic qdr

if i issue

mpirun -n 1024 ./IMB-MPI1 -npmin 1024 -iter 1 -mem 2.001 alltoallv

the job just stalls after the "List of Benchmarks to run: Alltoallv"
line outputs from IMB-MPI

if i switch it to alltoall the test does progress

often when running various size alltoall's i'll get

"too many retries sending message to <>:<>, giving up

i'm able to use infiniband just fine (our lustre filesystem mounts
over it) and i have other mpi programs running

it only seems to stem when i run alltoall type primitives

any thoughts on debugging where the failures are, i might just need to
turn up the debugging, but i'm not sure where
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] OpenMPI 3.0.0, compilation using Intel icc 11.1 on Linux, error when compiling pmix_mmap

2017-10-10 Thread Ted Sussman
Hello all,

Thank you for your responses.

I worked around the issue by building and installing pmix-1.1.1 separately, to 
directory 
/opt/pmix-1.1.1, then using

--with-pmix=/opt/pmix-1.1.1

when configuring OpenMPI 3.0.0.

Sincerely,

Ted Sussman

On 2 Oct 2017 at 19:30, Jeff Squyres (jsquyres) wrote:

> I think that file does get included indirectly, but the real issue is the old 
> Intel compiler not supporting (struct argparse).  I.e., the solution might 
> well be "use a newer compiler."
> 
> 
> > On Oct 2, 2017, at 2:44 PM, r...@open-mpi.org wrote:
> > 
> > I correctly understood the file and the errors. I´m just pointing out that 
> > the referenced file cannot possibly contain a pointer to 
> > opal/threads/condition.h. There is no include in that file that can pull in 
> > that path.
> > 
> > 
> >> On Oct 2, 2017, at 11:39 AM, Jeff Squyres (jsquyres)  
> >> wrote:
> >> 
> >> Ralph --
> >> 
> >> I think he cited a typo in his email.  The actual file he is referring to 
> >> is 
> >> 
> >> -
> >> $ find . -name pmix_mmap.c
> >> ./opal/mca/pmix/pmix2x/pmix/src/sm/pmix_mmap.c
> >> -
> >> 
> >> From his log file, there appear to be two problems:
> >> 
> >> -
> >> sm/pmix_mmap.c(66): warning #266: function "posix_fallocate" declared 
> >> implicitly
> >> if (0 != (rc = posix_fallocate(sm_seg->seg_id, 0, size))) {
> >>^
> >> 
> >> sm/pmix_mmap.c(88): warning #266: function "ftruncate" declared implicitly
> >> if (0 != ftruncate(sm_seg->seg_id, size)) {
> >>  ^
> >> -
> >> 
> >> (which are just warnings but we should probably fix them)
> >> 
> >> and
> >> 
> >> -
> >> /opt/openmpi-3.0.0-Intel-src/opal/threads/condition.h(96): error: pointer 
> >> to incomplete class type is not allowed
> >> absolute.tv_sec = abstime->tv_sec;
> >>   ^
> >> -
> >> 
> >> This one appears to be the actual error.
> >> 
> >> abstime is a (struct timespec), which, according to 
> >> http://pubs.opengroup.org/onlinepubs/7908799/xsh/time.h.html, should be 
> >> declared in , which is definitely #included by 
> >> opal/threads/condition.h.
> >> 
> >> Since this error occurred with Intel 11.x but didn't occur with later 
> >> versions of the Intel compiler, I'm wondering if the Intel 11.x compiler 
> >> suite didn't support (struct timespec).
> >> 
> >> Can you stop using Intel 11.x and only use later versions of the Intel 
> >> compiler?
> >> 
> >> 
> >> 
> >>> On Oct 1, 2017, at 11:59 PM, r...@open-mpi.org wrote:
> >>> 
> >>> Afraid I´m rather stumped on this one. There is no such include file in 
> >>> pmix_mmap, nor is there any include file that might lead to it. You might 
> >>> try starting again from scratch to ensure you aren´t getting some weird 
> >>> artifact.
> >>> 
> >>> 
>  On Sep 29, 2017, at 1:12 PM, Ted Sussman  wrote:
>  
>  Hello all,
>  
>  I downloaded openmpi-3.0.0.tar.gz and attempted to build it on my Red 
>  Hat Linux computer, kernel 2.6.18-194.el5.
>  
>  The C compiler used is Intel icc, version 11.1.
>  
>  The make failed when compiling pmix_mmap, with messages such as
>  
>  /opt/openmpi-3.0.0-Intel-src/opal/threads/conditions.h(96): error: 
>  pointer to incomplete class type is not allowed
>  
>    absolute.tv_sec = abstime->tv_sec;
>  
>  I have attached a tar file with the output from configure and the output 
>  from make.
>  
>  I was able to build openmpi-2.1.1 using the same computer and compiler.
>  
>  I was able to build openmpi-3.0.0 using a different computer, with icc 
>  version 14.0.4.
>  
>  Can you please tell me how I can avoid this compilation error, when 
>  using icc version 11.1?
>  
>  Sincerely,
>  
>  Ted Sussman
>  
>  The following section of this message contains a file attachment
>  prepared for transmission using the Internet MIME message format.
>  If you are using Pegasus Mail, or any other MIME-compliant system,
>  you should be able to save it or view it from within your mailer.
>  If you cannot, please ask your system administrator for assistance.
>  
>   File information ---
>    File:  openmpi.tgz
>    Date:  29 Sep 2017, 15:59
>    Size:  41347 bytes.
>    Type:  Binary
>  ___
>  users mailing list
>  users@lists.open-mpi.org
>  https://lists.open-mpi.org/mailman/listinfo/users
> >>> 
> >>> ___
> >>> users mailing list
> >>> users@lists.open-mpi.org
> >>> https://lists.open-mpi.org/mailman/listinfo/users
> >> 
> >> 
> >> -- 
> >> Jeff Squyres
> >> jsquy...@cisco.com
> >> 
> >> ___
> >> users mailing list
> >> users@lists.open-mpi.org
> >> https://lists.open-mpi.org/mailman/listinfo/users
> > 
> > ___
> 

[OMPI users] RoCE device performance with large message size

2017-10-10 Thread Brendan Myers
Hello All,

I have a RoCE interoperability event starting next week and I was wondering
if anyone had any ideas to help me with a new vendor I am trying to help get
ready. 

I am using:

* Open MPI 2.1

* Intel MPI Benchmarks 2018

* OFED 3.18 (requirement from vendor)

* SLES 11 SP3 (requirement from vendor)

 

The problem seems to be that the device does not handle larger message sizes
well and I am sure they will be working on this but I am hoping there may be
a way to complete an IMB run with some Open MPI parameter tweaking.

Sample of IMB output from a Sendrecv benchmark:

 

262144  160   131.07   132.24   131.80  3964.56

   524288   80   277.42   284.57   281.57
3684.71

  1048576   40   461.16   474.83   470.02
4416.59

  20971523  1112.15   4294965.49   2147851.04
0.98

  41943042  2815.25   8589929.73   3222731.54
0.98

 

In red text is what looks like the problematic results. This happens on many
of the benchmarks at larger message sizes and causes either a major slowdown
or it causes the job to abort with error:

 

The InfiniBand retry count between two MPI processes has been exceeded.

 

If anyone has any thoughts on how I can complete the benchmarks without the
job aborting I would appreciate it. If anyone has ideas as to why a RoCE
device might show this issue I would take any information on offer. If more
data is required please let me know what is relevant.

 

 

Thank you,

Brendan T. W. Myers

 

 

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] RoCE device performance with large message size

2017-10-10 Thread Jeff Squyres (jsquyres)
Probably want to check to make sure that lossless ethernet is enabled 
everywhere (that's a common problem I've seen); otherwise, you end up in 
timeouts and retransmissions.

Check with your vendor on how to do layer-0 diagnostics, etc.

Also, if this is a new vendor, they should probably try running this themselves 
-- IMB is fairly abusive to the network stack and turns up many bugs in lower 
layers (drivers, firmware), etc.


> On Oct 10, 2017, at 3:29 PM, Brendan Myers  
> wrote:
> 
> Hello All,
> I have a RoCE interoperability event starting next week and I was wondering 
> if anyone had any ideas to help me with a new vendor I am trying to help get 
> ready. 
> I am using:
> · Open MPI 2.1
> · Intel MPI Benchmarks 2018
> · OFED 3.18 (requirement from vendor)
> · SLES 11 SP3 (requirement from vendor)
>  
> The problem seems to be that the device does not handle larger message sizes 
> well and I am sure they will be working on this but I am hoping there may be 
> a way to complete an IMB run with some Open MPI parameter tweaking.
> Sample of IMB output from a Sendrecv benchmark:
>  
> 262144  160   131.07   132.24   131.80  3964.56
>524288   80   277.42   284.57   281.57  3684.71
>   1048576   40   461.16   474.83   470.02  4416.59
>   20971523  1112.15   4294965.49   2147851.04 0.98
>   41943042  2815.25   8589929.73   3222731.54 0.98
>  
> In red text is what looks like the problematic results. This happens on many 
> of the benchmarks at larger message sizes and causes either a major slowdown 
> or it causes the job to abort with error:
>  
> The InfiniBand retry count between two MPI processes has been exceeded.
>  
> If anyone has any thoughts on how I can complete the benchmarks without the 
> job aborting I would appreciate it. If anyone has ideas as to why a RoCE 
> device might show this issue I would take any information on offer. If more 
> data is required please let me know what is relevant.
>  
>  
> Thank you,
> Brendan T. W. Myers
>  
>  
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users