Re: [OMPI users] Hybrid MPI+OpenMP benchmarks (looking for)
HPGMG-FV is easy to build and to run both serial, mpi, openmp and mpi+openmp. /Peter On Mon, 9 Oct 2017 17:54:02 + "Sasso, John (GE Digital, consultant)" wrote: > I am looking for a decent hybrid MPI+OpenMP benchmark utility which I > can easily build and run with OpenMPI 1.6.5 (at least) and OpenMP > under Linux, using GCC build of OpenMPI as well as the Intel Compiler > suite. I have looked at CP2K but that is much too complex a build > for its own good (I managed to build all the prerequisite libraries, > only to have the build of cp2k itself just fail). Also looked at > HOMB 1.0. > > I am wondering what others have used. The build should be simple and > not require a large # of prereq libraries to build beforehand. > Thanks! > > --john ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] alltoallv
i'm getting stuck trying to run some fairly large IMB-MPI alltoall tests under openmpi 2.0.2 on rhel 7.4 i have two different clusters, one running mellanox fdr10 and one running qlogic qdr if i issue mpirun -n 1024 ./IMB-MPI1 -npmin 1024 -iter 1 -mem 2.001 alltoallv the job just stalls after the "List of Benchmarks to run: Alltoallv" line outputs from IMB-MPI if i switch it to alltoall the test does progress often when running various size alltoall's i'll get "too many retries sending message to <>:<>, giving up i'm able to use infiniband just fine (our lustre filesystem mounts over it) and i have other mpi programs running it only seems to stem when i run alltoall type primitives any thoughts on debugging where the failures are, i might just need to turn up the debugging, but i'm not sure where ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] OpenMPI 3.0.0, compilation using Intel icc 11.1 on Linux, error when compiling pmix_mmap
Hello all, Thank you for your responses. I worked around the issue by building and installing pmix-1.1.1 separately, to directory /opt/pmix-1.1.1, then using --with-pmix=/opt/pmix-1.1.1 when configuring OpenMPI 3.0.0. Sincerely, Ted Sussman On 2 Oct 2017 at 19:30, Jeff Squyres (jsquyres) wrote: > I think that file does get included indirectly, but the real issue is the old > Intel compiler not supporting (struct argparse). I.e., the solution might > well be "use a newer compiler." > > > > On Oct 2, 2017, at 2:44 PM, r...@open-mpi.org wrote: > > > > I correctly understood the file and the errors. I´m just pointing out that > > the referenced file cannot possibly contain a pointer to > > opal/threads/condition.h. There is no include in that file that can pull in > > that path. > > > > > >> On Oct 2, 2017, at 11:39 AM, Jeff Squyres (jsquyres) > >> wrote: > >> > >> Ralph -- > >> > >> I think he cited a typo in his email. The actual file he is referring to > >> is > >> > >> - > >> $ find . -name pmix_mmap.c > >> ./opal/mca/pmix/pmix2x/pmix/src/sm/pmix_mmap.c > >> - > >> > >> From his log file, there appear to be two problems: > >> > >> - > >> sm/pmix_mmap.c(66): warning #266: function "posix_fallocate" declared > >> implicitly > >> if (0 != (rc = posix_fallocate(sm_seg->seg_id, 0, size))) { > >>^ > >> > >> sm/pmix_mmap.c(88): warning #266: function "ftruncate" declared implicitly > >> if (0 != ftruncate(sm_seg->seg_id, size)) { > >> ^ > >> - > >> > >> (which are just warnings but we should probably fix them) > >> > >> and > >> > >> - > >> /opt/openmpi-3.0.0-Intel-src/opal/threads/condition.h(96): error: pointer > >> to incomplete class type is not allowed > >> absolute.tv_sec = abstime->tv_sec; > >> ^ > >> - > >> > >> This one appears to be the actual error. > >> > >> abstime is a (struct timespec), which, according to > >> http://pubs.opengroup.org/onlinepubs/7908799/xsh/time.h.html, should be > >> declared in , which is definitely #included by > >> opal/threads/condition.h. > >> > >> Since this error occurred with Intel 11.x but didn't occur with later > >> versions of the Intel compiler, I'm wondering if the Intel 11.x compiler > >> suite didn't support (struct timespec). > >> > >> Can you stop using Intel 11.x and only use later versions of the Intel > >> compiler? > >> > >> > >> > >>> On Oct 1, 2017, at 11:59 PM, r...@open-mpi.org wrote: > >>> > >>> Afraid I´m rather stumped on this one. There is no such include file in > >>> pmix_mmap, nor is there any include file that might lead to it. You might > >>> try starting again from scratch to ensure you aren´t getting some weird > >>> artifact. > >>> > >>> > On Sep 29, 2017, at 1:12 PM, Ted Sussman wrote: > > Hello all, > > I downloaded openmpi-3.0.0.tar.gz and attempted to build it on my Red > Hat Linux computer, kernel 2.6.18-194.el5. > > The C compiler used is Intel icc, version 11.1. > > The make failed when compiling pmix_mmap, with messages such as > > /opt/openmpi-3.0.0-Intel-src/opal/threads/conditions.h(96): error: > pointer to incomplete class type is not allowed > > absolute.tv_sec = abstime->tv_sec; > > I have attached a tar file with the output from configure and the output > from make. > > I was able to build openmpi-2.1.1 using the same computer and compiler. > > I was able to build openmpi-3.0.0 using a different computer, with icc > version 14.0.4. > > Can you please tell me how I can avoid this compilation error, when > using icc version 11.1? > > Sincerely, > > Ted Sussman > > The following section of this message contains a file attachment > prepared for transmission using the Internet MIME message format. > If you are using Pegasus Mail, or any other MIME-compliant system, > you should be able to save it or view it from within your mailer. > If you cannot, please ask your system administrator for assistance. > > File information --- > File: openmpi.tgz > Date: 29 Sep 2017, 15:59 > Size: 41347 bytes. > Type: Binary > ___ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users > >>> > >>> ___ > >>> users mailing list > >>> users@lists.open-mpi.org > >>> https://lists.open-mpi.org/mailman/listinfo/users > >> > >> > >> -- > >> Jeff Squyres > >> jsquy...@cisco.com > >> > >> ___ > >> users mailing list > >> users@lists.open-mpi.org > >> https://lists.open-mpi.org/mailman/listinfo/users > > > > ___ >
[OMPI users] RoCE device performance with large message size
Hello All, I have a RoCE interoperability event starting next week and I was wondering if anyone had any ideas to help me with a new vendor I am trying to help get ready. I am using: * Open MPI 2.1 * Intel MPI Benchmarks 2018 * OFED 3.18 (requirement from vendor) * SLES 11 SP3 (requirement from vendor) The problem seems to be that the device does not handle larger message sizes well and I am sure they will be working on this but I am hoping there may be a way to complete an IMB run with some Open MPI parameter tweaking. Sample of IMB output from a Sendrecv benchmark: 262144 160 131.07 132.24 131.80 3964.56 524288 80 277.42 284.57 281.57 3684.71 1048576 40 461.16 474.83 470.02 4416.59 20971523 1112.15 4294965.49 2147851.04 0.98 41943042 2815.25 8589929.73 3222731.54 0.98 In red text is what looks like the problematic results. This happens on many of the benchmarks at larger message sizes and causes either a major slowdown or it causes the job to abort with error: The InfiniBand retry count between two MPI processes has been exceeded. If anyone has any thoughts on how I can complete the benchmarks without the job aborting I would appreciate it. If anyone has ideas as to why a RoCE device might show this issue I would take any information on offer. If more data is required please let me know what is relevant. Thank you, Brendan T. W. Myers ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] RoCE device performance with large message size
Probably want to check to make sure that lossless ethernet is enabled everywhere (that's a common problem I've seen); otherwise, you end up in timeouts and retransmissions. Check with your vendor on how to do layer-0 diagnostics, etc. Also, if this is a new vendor, they should probably try running this themselves -- IMB is fairly abusive to the network stack and turns up many bugs in lower layers (drivers, firmware), etc. > On Oct 10, 2017, at 3:29 PM, Brendan Myers > wrote: > > Hello All, > I have a RoCE interoperability event starting next week and I was wondering > if anyone had any ideas to help me with a new vendor I am trying to help get > ready. > I am using: > · Open MPI 2.1 > · Intel MPI Benchmarks 2018 > · OFED 3.18 (requirement from vendor) > · SLES 11 SP3 (requirement from vendor) > > The problem seems to be that the device does not handle larger message sizes > well and I am sure they will be working on this but I am hoping there may be > a way to complete an IMB run with some Open MPI parameter tweaking. > Sample of IMB output from a Sendrecv benchmark: > > 262144 160 131.07 132.24 131.80 3964.56 >524288 80 277.42 284.57 281.57 3684.71 > 1048576 40 461.16 474.83 470.02 4416.59 > 20971523 1112.15 4294965.49 2147851.04 0.98 > 41943042 2815.25 8589929.73 3222731.54 0.98 > > In red text is what looks like the problematic results. This happens on many > of the benchmarks at larger message sizes and causes either a major slowdown > or it causes the job to abort with error: > > The InfiniBand retry count between two MPI processes has been exceeded. > > If anyone has any thoughts on how I can complete the benchmarks without the > job aborting I would appreciate it. If anyone has ideas as to why a RoCE > device might show this issue I would take any information on offer. If more > data is required please let me know what is relevant. > > > Thank you, > Brendan T. W. Myers > > > ___ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users -- Jeff Squyres jsquy...@cisco.com ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users