Dear Gilles, As per your suggestion, I tried the inline patch as discussed in https://github.com/open-mpi/ompi/pull/8622#issuecomment-800776864 .
This has fixed the regression completely for the remaining test cases in FFTW MPI in-built test bench - which was persisting even after using the git patch https://patch-diff.githubusercontent.com/raw/open-mpi/ompi/pull/8623.patch as merged by you. So, it seems there is a performance difference between asm volatile("": : :"memory"); and __atomic_thread_fence (__ATOMIC_ACQUIRE) on x86_64. I would request you to please make this change and merge it to respective openMPI branches - please intimate if possible whenever that takes place. I also request you to plan for an early 4.1.1rc2 release at least by June 2021. With Regards, S. Biplab Raut -----Original Message----- From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com> Sent: Thursday, April 1, 2021 8:31 AM To: Raut, S Biplab <biplab.r...@amd.com> Subject: Re: [OMPI users] Stable and performant openMPI version for Ubuntu20.04 ? [CAUTION: External Email] I really had no time to investigate this. A quick test is to apply the patch in the inline comment at https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopen-mpi%2Fompi%2Fpull%2F8622%23issuecomment-800776864&data=04%7C01%7CBiplab.Raut%40amd.com%7C6b277b24afa04650c86c08d8f4ba5dc7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637528428572315404%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4PBWIZsFdyBO2gUbYURh9iDwQxMdM%2FUfQV4%2Bg%2Farnh0%3D&reserved=0 and see whether it helps. If not, I would recommend you try Open MPI 3.1.6 (after manually applying https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopen-mpi%2Fompi%2Fpull%2F8624.patch&data=04%7C01%7CBiplab.Raut%40amd.com%7C6b277b24afa04650c86c08d8f4ba5dc7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637528428572315404%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yZbu1dDcC1awpiuclvyso9HANqAHIEn4p1pT862n4LY%3D&reserved=0) and see whether there is a performance regression between 3.1.1 and (patched) 3.1.6 Cheers, Gilles On Thu, Apr 1, 2021 at 11:25 AM Raut, S Biplab <biplab.r...@amd.com> wrote: > > Dear Gilles, > Did you get a chance to look into my below mail content? > I find the regression is not completely fixed. > > With Regards, > S. Biplab Raut > > -----Original Message----- > From: Raut, S Biplab > Sent: Wednesday, March 24, 2021 11:32 PM > To: Gilles Gouaillardet <gilles.gouaillar...@gmail.com> > Subject: RE: [OMPI users] Stable and performant openMPI version for > Ubuntu20.04 ? > > Dear Gilles, > After applying the below patch, I thoroughly tested > various test cases of FFTW using its in-built benchmark test program. > Many of the test cases, that showed regression previously as compared to > openMPI3.1.1, have now improved with positive gains. > However, there are still few test cases where the performance is lower than > openMPI3.1.1. > Are there more performance issues in openMPI4.x that need to be discovered? > > Please check the below details. > > 1) For problem size 1024x1024x512 :- > $ mpirun --map-by core --rank-by core --bind-to core > ./fftw/mpi/mpi-bench -opatient -r500 -s dcif1024x1024x512 > openMPI3.3.1_stock performance -> 147 MFLOPS > openMPI4.1.0_stock performance -> 137 MFLOPS > openMPI4.1.0_patch performance -> 137 MFLOPS > 2) For problem size 512x512x512 :- > $ mpirun --map-by core --rank-by core --bind-to core > ./fftw/mpi/mpi-bench -opatient -r500 -s dcif512x512x512 > openMPI3.3.1_stock performance -> 153 MFLOPS > openMPI4.1.0_stock performance -> 144 MFLOPS > openMPI4.1.0_patch performance -> 147 MFLOPS > > With Regards, > S. Biplab Rsut > > -----Original Message----- > From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com> > Sent: Wednesday, March 17, 2021 11:14 AM > To: Raut, S Biplab <biplab.r...@amd.com> > Subject: Re: [OMPI users] Stable and performant openMPI version for > Ubuntu20.04 ? > > [CAUTION: External Email] > > The patch has been merged into the v4.1.x release branch, but 4.1.1rc2 has > not been yet released. > Your best bet is to download and apply the patch at > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith > ub.com%2Fopen-mpi%2Fompi%2Fpull%2F8623.patch&data=04%7C01%7CBiplab > .Raut%40amd.com%7C6b277b24afa04650c86c08d8f4ba5dc7%7C3dd8961fe4884e608 > e11a82d994e183d%7C0%7C0%7C637528428572315404%7CUnknown%7CTWFpbGZsb3d8e > yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C10 > 00&sdata=OKGqhhQM68vPhuADfVdqOlHmY0ZHGtUdM%2B1WeeJ9WoY%3D&rese > rved=0 (since this does not involve any configury stuff, the process > should be painless) > > Cheers, > > Gilles > > On Wed, Mar 17, 2021 at 2:31 PM Raut, S Biplab <biplab.r...@amd.com> wrote: > > > > Dear Gilles, > > Thank you for your support and quick fix for this > > issue. > > Could you tell me if the fix is finally merged and how do I get the RC > > version of this code (v4.1) ? > > Please point me to exact link, it will be helpful (since it will be used in > > production servers). > > > > With Regards, > > S. Biplab Raut > > > > -----Original Message----- > > From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com> > > Sent: Sunday, March 14, 2021 4:18 PM > > To: Raut, S Biplab <biplab.r...@amd.com> > > Subject: Re: [OMPI users] Stable and performant openMPI version for > > Ubuntu20.04 ? > > > > [CAUTION: External Email] > > > > This is something you can/have to do by yourself: > > log into github, open the issue and click the Subscribe button in > > Notifications > > > > Cheers, > > > > Gilles > > > > On Sun, Mar 14, 2021 at 7:30 PM Raut, S Biplab <biplab.r...@amd.com> wrote: > > > > > > Thank you very much for your support. > > > Can you please add me to this issue/ticket as a watcher/stake-holder? > > > > > > With Regards, > > > S. Biplab Raut > > > > > > -----Original Message----- > > > From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com> > > > Sent: Sunday, March 14, 2021 3:23 PM > > > To: Raut, S Biplab <biplab.r...@amd.com> > > > Subject: Re: [OMPI users] Stable and performant openMPI version for > > > Ubuntu20.04 ? > > > > > > [CAUTION: External Email] > > > > > > Glad too we are finally on the same page! > > > > > > I filled an issue at > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2F > > > gi > > > thub.com%2Fopen-mpi%2Fompi%2Fissues%2F8603&data=04%7C01%7CBipl > > > ab > > > .Raut%40amd.com%7C4bca16633f1b488e486008d8e907ac24%7C3dd8961fe4884 > > > e6 > > > 08e11a82d994e183d%7C0%7C0%7C637515566455681853%7CUnknown%7CTWFpbGZ > > > sb > > > 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0 > > > %3 > > > D%7C1000&sdata=khoOPlLF2xbwhcUQhmk%2B0IBvuC4J5OyoBkwepijKJc0%3 > > > D& > > > amp;reserved=0, > > > let's follow-up here from now > > > > > > Cheers, > > > > > > Gilles > > > > > > On Sun, Mar 14, 2021 at 5:38 PM Raut, S Biplab <biplab.r...@amd.com> > > > wrote: > > > > > > > > Dear Gilles, > > > > > > > > Reposting your comments along with my replies in > > > > the mailing-list for everybody to view/react. > > > > > > > > > > > > > > > > I am seeing some important performance degradation between Open > > > > MPI > > > > > > > > 3.1.1 and the top of the v3.1.x branch > > > > > > > > when running on a large number of cores. > > > > > > > > Same performance between 4.1.0 and the top of v3.1.x > > > > > > > > I am now running git bisect to find out when this started happening. > > > > > > > > I am finally feeling relieved and happy that you could reproduce and > > > > acknowledge this regression !! > > > > > > > > Do I need to file any bug officially anywhere? > > > > > > > > > > > > > > > > IIRC, I noted an xpmem error in your logs (that means xpmem is not > > > > used). > > > > > > > > The root cause could be that the xpmem kernel module is not > > > > loaded, of the permissions on the device are incorrect As Nathan > > > > pointed out, xpmem is likely to get the best performances, so > > > > while I am running git bisect, I do invite you to fix your xpmem > > > > issue and see how this impacts performances > > > > > > > > Sure, I will try to fix the xpmem error and check the impact on the > > > > performance. > > > > > > > > > > > > > > > > With Regards, > > > > > > > > S. Biplab Raut > > > > > > > > > > > > > > > > -----Original Message----- > > > > From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com> > > > > Sent: Sunday, March 14, 2021 8:45 AM > > > > To: Raut, S Biplab <biplab.r...@amd.com> > > > > Subject: Re: [OMPI users] Stable and performant openMPI version for > > > > Ubuntu20.04 ? > > > > > > > > > > > > > > > > [CAUTION: External Email] > > > > > > > > > > > > > > > > I am seeing some important performance degradation between Open > > > > MPI > > > > > > > > 3.1.1 and the top of the v3.1.x branch > > > > > > > > when running on a large number of cores. > > > > > > > > Same performance between 4.1.0 and the top of v3.1.x > > > > > > > > > > > > > > > > I am now running git bisect to find out when this started happening. > > > > > > > > > > > > > > > > IIRC, I noted an xpmem error in your logs (that means xpmem is not > > > > used). > > > > > > > > The root cause could be that the xpmem kernel module is not > > > > loaded, of the permissions on the device are incorrect As Nathan > > > > pointed out, xpmem is likely to get the best performances, so > > > > while I am running git bisect, I do invite you to fix your xpmem > > > > issue and see how this impacts performances > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > Gilles > > > > > > > > > > > > > > > > On Sat, Mar 13, 2021 at 12:08 AM Raut, S Biplab <biplab.r...@amd.com> > > > > wrote: > > > > > > > > > > > > > > > > > > Dear Gilles, > > > > > > > > > > > > > > > > > > Please check my replies inline. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >>> Can you please post the output of > > > > > > > > > > > > > > > > > > >>> ompi_info --param btl vader --level 3 > > > > > > > > > > > > > > > > > > >>> with both Open MPI 3.1 and 4.1? > > > > > > > > > > > > > > > > > > openMPI3.1.1 > > > > > > > > > > > > > > > > > > ------------------ > > > > > > > > > > > > > > > > > > $ ompi_info --param btl vader --level 3 > > > > > > > > > > > > > > > > > > MCA btl: vader (MCA v2.1.0, API v3.0.0, > > > > > Component > > > > > > > > > v3.1.1) > > > > > > > > > > > > > > > > > > MCA btl vader: > > > > > > > > > --------------------------------------------------- > > > > > > > > > > > > > > > > > > MCA btl vader: parameter "btl_vader_single_copy_mechanism" > > > > > > > > > > > > > > > > > > (current value: "cma", data source: > > > > > default, level: > > > > > > > > > > > > > > > > > > 3 user/all, type: int) > > > > > > > > > > > > > > > > > > Single copy mechanism to use > > > > > (defaults to > > > > > > > > > best > > > > > > > > > > > > > > > > > > available) > > > > > > > > > > > > > > > > > > Valid values: 1:"cma", 3:"none" > > > > > > > > > > > > > > > > > > openMPI4.1.0 > > > > > > > > > > > > > > > > > > ------------------ > > > > > > > > > > > > > > > > > > $ ompi_info --param btl vader --level 3 > > > > > > > > > > > > > > > > > > MCA btl: vader (MCA v2.1.0, API v3.1.0, > > > > > Component > > > > > > > > > v4.1.0) > > > > > > > > > > > > > > > > > > MCA btl vader: > > > > > > > > > --------------------------------------------------- > > > > > > > > > > > > > > > > > > MCA btl vader: parameter "btl_vader_single_copy_mechanism" > > > > > > > > > > > > > > > > > > (current value: "cma", data source: > > > > > default, level: > > > > > > > > > > > > > > > > > > 3 user/all, type: int) > > > > > > > > > > > > > > > > > > Single copy mechanism to use > > > > > (defaults to > > > > > > > > > best > > > > > > > > > > > > > > > > > > available) > > > > > > > > > > > > > > > > > > Valid values: 1:"cma", 4:"emulated", > > > > > 3:"none" > > > > > > > > > > > > > > > > > > MCA btl vader: parameter "btl_vader_backing_directory" > > > > > > > > > (current > > > > > > > > > > > > > > > > > > value: "/dev/shm", data source: > > > > > default, > > > > > > > > > level: 3 > > > > > > > > > > > > > > > > > > user/all, type: string) > > > > > > > > > > > > > > > > > > Directory to place backing files for > > > > > shared > > > > > > > > > memory > > > > > > > > > > > > > > > > > > communication. This directory should > > > > > be on a > > > > > > > > > local > > > > > > > > > > > > > > > > > > filesystem such as /tmp or /dev/shm > > > > > (default: > > > > > > > > > > > > > > > > > > (linux) /dev/shm, (others) session > > > > > > > > > directory) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >>> What if you run with only 2 MPI ranks? > > > > > > > > > > > > > > > > > > >>> do you observe similar performance differences between Open MPI > > > > > >>> 3.1 and 4.1? > > > > > > > > > > > > > > > > > > When I run only 2 MPI ranks, the performance regression is not > > > > > significant. > > > > > > > > > > > > > > > > > > openMPI3.1.1 gives MFLOPS: 11122 > > > > > > > > > > > > > > > > > > openMPI4.1.0 gives MFLOPS: 11041 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > With Regards, > > > > > > > > > > > > > > > > > > S. Biplab Raut > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com> > > > > > > > > > Sent: Friday, March 12, 2021 7:07 PM > > > > > > > > > To: Raut, S Biplab <biplab.r...@amd.com> > > > > > > > > > Subject: Re: [OMPI users] Stable and performant openMPI version for > > > > > Ubuntu20.04 ? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [CAUTION: External Email] > > > > > > > > > > > > > > > > > > Can you please post the output of > > > > > > > > > > > > > > > > > > ompi_info --param btl vader --level 3 > > > > > > > > > > > > > > > > > > with both Open MPI 3.1 and 4.1? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > What if you run with only 2 MPI ranks? > > > > > > > > > > > > > > > > > > do you observe similar performance differences between Open MPI 3.1 > > > > > and 4.1? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > > > Gilles > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Mar 12, 2021 at 6:31 PM Raut, S Biplab <biplab.r...@amd.com> > > > > > wrote: > > > > > > > > > > > > > > > > > > Dear Gilles, > > > > > > > > > > > > > > > > > > Thank you for the reply. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >>> when running > > > > > > > > > > > > > > > > > > >>> mpirun --map-by core -rank-by core --bind-to core --mca > > > > > >>> pml > > > > > >>> ob1 > > > > > > > > > >>> --mca btl vader,self ./mpi-bench ic1000000 > > > > > > > > > > > > > > > > > > >>> I go similar flops with Open MPI 3.1.1, 3.1.6, 4.1.0 and > > > > > >>> 4.1.1rc1 > > > > > > > > > >>> on my system > > > > > > > > > > > > > > > > > > >>> If you are using a different command line, please let me > > > > > >>> know and > > > > > > > > > >>> I will give it a try > > > > > > > > > > > > > > > > > > Although the command line that I use is different, but I ran with the > > > > > above command line as used by you. > > > > > > > > > > > > > > > > > > I still find that openMPI4.1.0 is poor as compared to openMPI3.1.1. > > > > > Please check the details below. I have also provided my system > > > > > details if it matters. > > > > > > > > > > > > > > > > > > openMPI3.1.1 > > > > > > > > > > > > > > > > > > ------------------- > > > > > > > > > > > > > > > > > > $ mpirun --map-by core -rank-by core --bind-to core --mca pml > > > > > ob1 > > > > > > > > > --mca btl vader,self ./mpi-bench ic1000000 > > > > > > > > > > > > > > > > > > Problem: ic1000000, setup: 552.20 ms, time: 1.33 ms, ``mflops'': > > > > > 75143 > > > > > > > > > > > > > > > > > > $ ompi_info --all|grep 'command line' > > > > > > > > > > > > > > > > > > Configure command line: '--prefix=/home/server/ompi3/gcc' > > > > > '--enable-mpi-fortran' '--enable-mpi-cxx' '--enable-shared=yes' > > > > > '--enable-static=yes' '--enable-mpi1-compatibility' > > > > > > > > > > > > > > > > > > User-specified command line > > > > > parameters > > > > > > > > > passed to ROMIO's configure script > > > > > > > > > > > > > > > > > > Complete set of command line > > > > > parameters > > > > > > > > > passed to ROMIO's configure script > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > openMPI4.1.0 > > > > > > > > > > > > > > > > > > ------------------- > > > > > > > > > > > > > > > > > > $ mpirun --map-by core -rank-by core --bind-to core --mca pml > > > > > ob1 > > > > > > > > > --mca btl vader,self ./mpi-bench ic1000000 > > > > > > > > > > > > > > > > > > Problem: ic1000000, setup: 557.12 ms, time: 1.75 ms, ``mflops'': > > > > > 57029 > > > > > > > > > > > > > > > > > > $ ompi_info --all|grep 'command line' > > > > > > > > > > > > > > > > > > Configure command line: '--prefix=/home/server/ompi4_plain' > > > > > '--enable-mpi-fortran' '--enable-mpi-cxx' '--enable-shared=yes' > > > > > '--enable-static=yes' '--enable-mpi1-compatibility' > > > > > > > > > > > > > > > > > > User-specified command line > > > > > parameters > > > > > > > > > passed to ROMIO's configure script > > > > > > > > > > > > > > > > > > Complete set of command line > > > > > parameters > > > > > > > > > passed to ROMIO's configure script > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > openMPI4.1.0 + xpmem > > > > > > > > > > > > > > > > > > -------------------------------- > > > > > > > > > > > > > > > > > > $ mpirun --map-by core -rank-by core --bind-to core --mca pml > > > > > ob1 > > > > > > > > > --mca btl vader,self ./mpi-bench ic1000000 > > > > > > > > > > > > > > > > > > -------------------------------------------------------------- > > > > > -- > > > > > ------ > > > > > > > > > ---- > > > > > > > > > > > > > > > > > > WARNING: Could not generate an xpmem segment id for this process' > > > > > > > > > > > > > > > > > > address space. > > > > > > > > > > > > > > > > > > The vader shared memory BTL will fall back on another > > > > > single-copy > > > > > > > > > > > > > > > > > > mechanism if one is available. This may result in lower performance. > > > > > > > > > > > > > > > > > > Local host: lib-daytonax-03 > > > > > > > > > > > > > > > > > > Error code: 2 (No such file or directory) > > > > > > > > > > > > > > > > > > -------------------------------------------------------------- > > > > > -- > > > > > ------ > > > > > > > > > ---- > > > > > > > > > > > > > > > > > > Problem: ic1000000, setup: 559.55 ms, time: 1.77 ms, ``mflops'': > > > > > 56280 > > > > > > > > > > > > > > > > > > $ ompi_info --all|grep 'command line' > > > > > > > > > > > > > > > > > > Configure command line: '--prefix=/home/server/ompi4_xmem' > > > > > '--with-xpmem=/opt/xpmm' '--enable-mpi-fortran' '--enable-mpi-cxx' > > > > > '--enable-shared=yes' '--enable-static=yes' > > > > > '--enable-mpi1-compatibility' > > > > > > > > > > > > > > > > > > User-specified command line > > > > > parameters > > > > > > > > > passed to ROMIO's configure script > > > > > > > > > > > > > > > > > > Complete set of command line > > > > > parameters > > > > > > > > > passed to ROMIO's configure script > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Other System Config > > > > > > > > > > > > > > > > > > ---------------------------- > > > > > > > > > - $ cat /etc/os-release > > > > > > > > > > > > > > > > > > NAME="Ubuntu" > > > > > > > > > > > > > > > > > > VERSION="20.04 LTS (Focal Fossa)" > > > > > > > > > > > > > > > > > > $ gcc -v > > > > > > > > > > > > > > > > > > cc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04) > > > > > > > > > > > > > > > > > > DRAM:- 1TB DDR4-3200 MT/s RDIMM memory > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The recommended command line to run would be as below:- > > > > > > > > > > > > > > > > > > mpirun --map-by core --rank-by core --bind-to core --mca pml > > > > > ob1 --mca > > > > > > > > > btl vader,self ./mpi-bench -owisdom -opatient -r1000 -s > > > > > icf1000000 > > > > > > > > > > > > > > > > > > (Here, -opatient would allow the use of best kernel/algorithm > > > > > plan, > > > > > > > > > > > > > > > > > > -r1000 would run the test for 1000 iterations to > > > > > avoid > > > > > > > > > run-to-run variations, > > > > > > > > > > > > > > > > > > -owisdom would take off the first-time setup > > > > > overhead/time > > > > > > > > > when executing the "mpirun command line" next time) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Please suggest me if any other details needed for you to analyze this > > > > > performance regression? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > With Regards, > > > > > > > > > > > > > > > > > > S. Biplab Raut > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com> > > > > > > > > > Sent: Friday, March 12, 2021 12:46 PM > > > > > > > > > To: Raut, S Biplab <biplab.r...@amd.com> > > > > > > > > > Subject: Re: [OMPI users] Stable and performant openMPI version for > > > > > Ubuntu20.04 ? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [CAUTION: External Email] > > > > > > > > > > > > > > > > > > when running > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > mpirun --map-by core -rank-by core --bind-to core --mca pml > > > > > ob1 --mca > > > > > > > > > btl vader,self ./mpi-bench ic1000000 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I go similar flops with Open MPI 3.1.1, 3.1.6, 4.1.0 and > > > > > 4.1.1rc1 on > > > > > > > > > my system > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If you are using a different command line, please let me know > > > > > and I > > > > > > > > > will give it a try > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Gilles > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Mar 12, 2021 at 3:20 PM Raut, S Biplab <biplab.r...@amd.com> > > > > > wrote: > > > > > > > > > > > > > > > > > > Reposting here without the logs - it seems there is a message size > > > > > limit of 150KB and so could not attach the logs. > > > > > > > > > > > > > > > > > > (Request the moderator to approve the original mail that has > > > > > > > > > attachment of compressed logs) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > My main concern in moving from ompi3.1.1 to ompi4.1.0 - Why does > > > > > ompi4.1.0 perform poorly as compared to opmi3.1.1 for some test > > > > > sizes??? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I ran "FFTW MPI bench binary" in verbose mode "10" (as suggested by > > > > > Gilles) for below three cases and confirmed that btl/vader is used by > > > > > default. > > > > > > > > > > > > > > > > > > FFTW MPI test for a 1D problem size (1000000) is run on a > > > > > single-node > > > > > > > > > as below:- > > > > > > > > > > > > > > > > > > mpirun --map-by core --rank-by core --bind-to core -np 128 > > > > > > > > > <fftw/mpi/bench program binary> <program binary options for > > > > > problem > > > > > > > > > size 1000000 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The three test cases are described below :- Test run with > > > > > openMPI3.1.1 performs best. > > > > > > > > > > > > > > > > > > Test run on Ubuntu20.04 and stock openMPI3.1.1 : gives mflops: > > > > > 76978 > > > > > > > > > Test run on Ubuntu20.04 and stock openMPI4.1.1 : gives mflops: > > > > > 56205 > > > > > > > > > Test run on Ubuntu20.04 and openMPI4.1.1 configured with xpmem : > > > > > gives > > > > > > > > > mflops: 56411 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Please check more details in the below mail chain. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > P.S: > > > > > > > > > > > > > > > > > > FFTW MPI bench test binary can be compiled from sources > > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Famd%2Famd-fftw&data=04%7C01%7CBiplab.Raut%40amd.com%7C6b277b24afa04650c86c08d8f4ba5dc7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637528428572315404%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZEEPzm3zODJ6fRLZUzidkhb38OuqbxKb%2FeKk%2FU6Fddk%3D&reserved=0 > > > > > OR > > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FFFTW%2Ffftw3&data=04%7C01%7CBiplab.Raut%40amd.com%7C6b277b24afa04650c86c08d8f4ba5dc7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637528428572325397%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=1F8i7XTpnvC6twV7lf3dCLfsKyYFAGP7A0tcnUWPSZ8%3D&reserved=0 > > > > > . > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > With Regards, > > > > > > > > > > > > > > > > > > S. Biplab Raut > > > > > >