Are you using gcc provided by Ubuntu 20.04?
if not which compiler (vendor and version) are you using?

My (light) understanding is that this patch should not impact 
performances, so I am not
sure whether the performance being back is something I do not understand,
 or the side effect
of a compiler bug.

Anyway, I issued https://github.com/open-mpi/ompi/pull/8789 and asked 
for a review.

Cheers,

Gilles

----- Original Message -----
> Dear Gilles,
>                     As per your suggestion, I tried the inline patch 
as discussed in 
https://github.com/open-mpi/ompi/pull/8622#issuecomment-800776864
 .
> 
> This has fixed the regression completely for the remaining test cases 
in FFTW MPI in-built test bench - which was persisting even after using 
the git patch 
https://patch-diff.githubusercontent.com/raw/open-mpi/ompi/pull/8623.patch
 as merged by you.
> So, it seems there is a performance difference between asm volatile("":
 : :"memory"); and __atomic_thread_fence (__ATOMIC_ACQUIRE) on x86_64.
> 
> I would request you to please make this change and merge it to 
respective openMPI branches - please intimate if possible whenever that 
takes place.
> I also request you to plan for an early 4.1.1rc2 release at least by 
June 2021.
> 
> With Regards,
> S. Biplab Raut 
> 
> -----Original Message-----
> From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com> 
> Sent: Thursday, April 1, 2021 8:31 AM
> To: Raut, S Biplab <biplab.r...@amd.com>
> Subject: Re: [OMPI users] Stable and performant openMPI version for 
Ubuntu20.04 ?
> 
> [CAUTION: External Email]
> 
> I really had no time to investigate this.
> 
> A quick test is to apply the patch in the inline comment at
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopen-mpi%2Fompi%2Fpull%2F8622%23issuecomment-800776864&amp;data=04%7C01%7CBiplab.Raut%40amd.com%7C6b277b24afa04650c86c08d8f4ba5dc7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637528428572315404%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=4PBWIZsFdyBO2gUbYURh9iDwQxMdM%2FUfQV4%2Bg%2Farnh0%3D&amp;reserved=0
 and see whether it helps.
> 
> If not, I would recommend you try Open MPI 3.1.6 (after manually 
applying 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopen-mpi%2Fompi%2Fpull%2F8624.patch&amp;data=04%7C01%7CBiplab.Raut%40amd.com%7C6b277b24afa04650c86c08d8f4ba5dc7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637528428572315404%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=yZbu1dDcC1awpiuclvyso9HANqAHIEn4p1pT862n4LY%3D&amp;reserved=0)
 and see whether there is a performance regression between 3.1.1 and (
patched) 3.1.6
> 
> Cheers,
> 
> Gilles
> 
> On Thu, Apr 1, 2021 at 11:25 AM Raut, S Biplab <biplab.r...@amd.com> 
wrote:
> >
> > Dear Gilles,
> >                      Did you get a chance to look into my below mail 
content?
> > I find the regression is not completely fixed.
> >
> > With Regards,
> > S. Biplab Raut
> >
> > -----Original Message-----
> > From: Raut, S Biplab
> > Sent: Wednesday, March 24, 2021 11:32 PM
> > To: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
> > Subject: RE: [OMPI users] Stable and performant openMPI version for 
Ubuntu20.04 ?
> >
> > Dear Gilles,
> >                     After applying the below patch, I thoroughly 
tested various test cases of FFTW using its in-built benchmark test 
program.
> > Many of the test cases, that showed regression previously as 
compared to openMPI3.1.1, have now improved with positive gains.
> > However, there are still few test cases where the performance is 
lower than openMPI3.1.1.
> > Are there more performance issues in openMPI4.x that need to be 
discovered?
> >
> > Please check the below details.
> >
> > 1) For problem size 1024x1024x512 :-
> >      $   mpirun --map-by core --rank-by core --bind-to core  ./fftw/
mpi/mpi-bench -opatient -r500 -s dcif1024x1024x512
> >      openMPI3.3.1_stock performance -> 147 MFLOPS
> >      openMPI4.1.0_stock performance -> 137 MFLOPS
> >      openMPI4.1.0_patch performance -> 137 MFLOPS
> > 2) For problem size 512x512x512 :-
> >      $   mpirun --map-by core --rank-by core --bind-to core  ./fftw/
mpi/mpi-bench -opatient -r500 -s dcif512x512x512
> >      openMPI3.3.1_stock performance -> 153  MFLOPS
> >      openMPI4.1.0_stock performance -> 144 MFLOPS
> >      openMPI4.1.0_patch performance -> 147 MFLOPS
> >
> > With Regards,
> > S. Biplab Rsut
> >
> > -----Original Message-----
> > From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
> > Sent: Wednesday, March 17, 2021 11:14 AM
> > To: Raut, S Biplab <biplab.r...@amd.com>
> > Subject: Re: [OMPI users] Stable and performant openMPI version for 
Ubuntu20.04 ?
> >
> > [CAUTION: External Email]
> >
> > The patch has been merged into the v4.1.x release branch, but 4.1.
1rc2 has not been yet released.
> > Your best bet is to download and apply the patch at
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith

> > ub.com%2Fopen-mpi%2Fompi%2Fpull%2F8623.patch&amp;data=04%7C01%
7CBiplab
> > .Raut%40amd.com%7C6b277b24afa04650c86c08d8f4ba5dc7%
7C3dd8961fe4884e608
> > e11a82d994e183d%7C0%7C0%7C637528428572315404%7CUnknown%
7CTWFpbGZsb3d8e
> > yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
7C10
> > 00&amp;sdata=OKGqhhQM68vPhuADfVdqOlHmY0ZHGtUdM%2B1WeeJ9WoY%3D&amp;
rese
> > rved=0 (since this does not involve any configury stuff, the process 
> > should be painless)
> >
> > Cheers,
> >
> > Gilles
> >
> > On Wed, Mar 17, 2021 at 2:31 PM Raut, S Biplab <biplab.r...@amd.com> 
wrote:
> > >
> > > Dear Gilles,
> > >                      Thank you for your support and quick fix for 
this issue.
> > > Could you tell me if the fix is finally merged and how do I get 
the RC version of this code (v4.1) ?
> > > Please point me to exact link, it will be helpful (since it will 
be used in production servers).
> > >
> > > With Regards,
> > > S. Biplab Raut
> > >
> > > -----Original Message-----
> > > From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
> > > Sent: Sunday, March 14, 2021 4:18 PM
> > > To: Raut, S Biplab <biplab.r...@amd.com>
> > > Subject: Re: [OMPI users] Stable and performant openMPI version 
for Ubuntu20.04 ?
> > >
> > > [CAUTION: External Email]
> > >
> > > This is something you can/have to do by yourself:
> > > log into github, open the issue and click the Subscribe button in 
> > > Notifications
> > >
> > > Cheers,
> > >
> > > Gilles
> > >
> > > On Sun, Mar 14, 2021 at 7:30 PM Raut, S Biplab <Biplab.Raut@amd.
com> wrote:
> > > >
> > > > Thank you very much for your support.
> > > > Can you please add me to this issue/ticket as a watcher/stake-
holder?
> > > >
> > > > With Regards,
> > > > S. Biplab Raut
> > > >
> > > > -----Original Message-----
> > > > From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
> > > > Sent: Sunday, March 14, 2021 3:23 PM
> > > > To: Raut, S Biplab <biplab.r...@amd.com>
> > > > Subject: Re: [OMPI users] Stable and performant openMPI version 
for Ubuntu20.04 ?
> > > >
> > > > [CAUTION: External Email]
> > > >
> > > > Glad too we are finally on the same page!
> > > >
> > > > I filled an issue at
> > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2F

> > > > gi 
> > > > thub.com%2Fopen-mpi%2Fompi%2Fissues%2F8603&amp;data=04%7C01%
7CBipl
> > > > ab
> > > > .Raut%40amd.com%7C4bca16633f1b488e486008d8e907ac24%
7C3dd8961fe4884
> > > > e6 
> > > > 08e11a82d994e183d%7C0%7C0%7C637515566455681853%7CUnknown%
7CTWFpbGZ
> > > > sb
> > > > 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6M
n0
> > > > %3 
> > > > D%7C1000&amp;sdata=khoOPlLF2xbwhcUQhmk%2B0IBvuC4J5OyoBkwepijKJc0%
3
> > > > D&
> > > > amp;reserved=0,
> > > > let's follow-up here from now
> > > >
> > > > Cheers,
> > > >
> > > > Gilles
> > > >
> > > > On Sun, Mar 14, 2021 at 5:38 PM Raut, S Biplab <Biplab.Raut@amd.
com> wrote:
> > > > >
> > > > > Dear Gilles,
> > > > >
> > > > >                      Reposting your comments along with my 
replies in the mailing-list for everybody to view/react.
> > > > >
> > > > >
> > > > >
> > > > > I am seeing some important performance degradation between 
Open 
> > > > > MPI
> > > > >
> > > > > 3.1.1 and the top of the v3.1.x branch
> > > > >
> > > > > when running on a large number of cores.
> > > > >
> > > > > Same performance between 4.1.0 and the top of v3.1.x
> > > > >
> > > > > I am now running git bisect to find out when this started 
happening.
> > > > >
> > > > > I am finally feeling relieved and happy that you could 
reproduce and acknowledge this regression !!
> > > > >
> > > > > Do I need to file any bug officially anywhere?
> > > > >
> > > > >
> > > > >
> > > > > IIRC, I noted an xpmem error in your logs (that means xpmem is 
not used).
> > > > >
> > > > > The root cause could be that the xpmem kernel module is not 
> > > > > loaded, of the permissions on the device are incorrect As 
Nathan 
> > > > > pointed out, xpmem is likely to get the best performances, so 
> > > > > while I am running git bisect, I do invite you to fix your 
xpmem 
> > > > > issue and see how this impacts performances
> > > > >
> > > > > Sure, I will try to fix the xpmem error and check the impact 
on the performance.
> > > > >
> > > > >
> > > > >
> > > > > With Regards,
> > > > >
> > > > > S. Biplab Raut
> > > > >
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
> > > > > Sent: Sunday, March 14, 2021 8:45 AM
> > > > > To: Raut, S Biplab <biplab.r...@amd.com>
> > > > > Subject: Re: [OMPI users] Stable and performant openMPI 
version for Ubuntu20.04 ?
> > > > >
> > > > >
> > > > >
> > > > > [CAUTION: External Email]
> > > > >
> > > > >
> > > > >
> > > > > I am seeing some important performance degradation between 
Open 
> > > > > MPI
> > > > >
> > > > > 3.1.1 and the top of the v3.1.x branch
> > > > >
> > > > > when running on a large number of cores.
> > > > >
> > > > > Same performance between 4.1.0 and the top of v3.1.x
> > > > >
> > > > >
> > > > >
> > > > > I am now running git bisect to find out when this started 
happening.
> > > > >
> > > > >
> > > > >
> > > > > IIRC, I noted an xpmem error in your logs (that means xpmem is 
not used).
> > > > >
> > > > > The root cause could be that the xpmem kernel module is not 
> > > > > loaded, of the permissions on the device are incorrect As 
Nathan 
> > > > > pointed out, xpmem is likely to get the best performances, so 
> > > > > while I am running git bisect, I do invite you to fix your 
xpmem 
> > > > > issue and see how this impacts performances
> > > > >
> > > > >
> > > > >
> > > > > Cheers,
> > > > >
> > > > >
> > > > >
> > > > > Gilles
> > > > >
> > > > >
> > > > >
> > > > > On Sat, Mar 13, 2021 at 12:08 AM Raut, S Biplab <Biplab.Raut@
amd.com> wrote:
> > > > >
> > > > > >
> > > > >
> > > > > > Dear Gilles,
> > > > >
> > > > > >
> > > > >
> > > > > >                     Please check my replies inline.
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > >>> Can you please post the output of
> > > > >
> > > > > >
> > > > >
> > > > > > >>> ompi_info --param btl vader --level 3
> > > > >
> > > > > >
> > > > >
> > > > > > >>> with both Open MPI 3.1 and 4.1?
> > > > >
> > > > > >
> > > > >
> > > > > > openMPI3.1.1
> > > > >
> > > > > >
> > > > >
> > > > > > ------------------
> > > > >
> > > > > >
> > > > >
> > > > > > $ ompi_info --param btl vader --level 3
> > > > >
> > > > > >
> > > > >
> > > > > >                  MCA btl: vader (MCA v2.1.0, API v3.0.0, 
> > > > > > Component
> > > > >
> > > > > > v3.1.1)
> > > > >
> > > > > >
> > > > >
> > > > > >            MCA btl vader:
> > > > >
> > > > > > ---------------------------------------------------
> > > > >
> > > > > >
> > > > >
> > > > > >            MCA btl vader: parameter "btl_vader_single_copy_
mechanism"
> > > > >
> > > > > >
> > > > >
> > > > > >                           (current value: "cma", data source:
 default, level:
> > > > >
> > > > > >
> > > > >
> > > > > >                           3 user/all, type: int)
> > > > >
> > > > > >
> > > > >
> > > > > >                           Single copy mechanism to use 
> > > > > > (defaults to
> > > > >
> > > > > > best
> > > > >
> > > > > >
> > > > >
> > > > > >                           available)
> > > > >
> > > > > >
> > > > >
> > > > > >                           Valid values: 1:"cma", 3:"none"
> > > > >
> > > > > >
> > > > >
> > > > > > openMPI4.1.0
> > > > >
> > > > > >
> > > > >
> > > > > > ------------------
> > > > >
> > > > > >
> > > > >
> > > > > > $ ompi_info --param btl vader --level 3
> > > > >
> > > > > >
> > > > >
> > > > > >                  MCA btl: vader (MCA v2.1.0, API v3.1.0, 
> > > > > > Component
> > > > >
> > > > > > v4.1.0)
> > > > >
> > > > > >
> > > > >
> > > > > >            MCA btl vader:
> > > > >
> > > > > > ---------------------------------------------------
> > > > >
> > > > > >
> > > > >
> > > > > >            MCA btl vader: parameter "btl_vader_single_copy_
mechanism"
> > > > >
> > > > > >
> > > > >
> > > > > >                           (current value: "cma", data source:
 default, level:
> > > > >
> > > > > >
> > > > >
> > > > > >                           3 user/all, type: int)
> > > > >
> > > > > >
> > > > >
> > > > > >                           Single copy mechanism to use 
> > > > > > (defaults to
> > > > >
> > > > > > best
> > > > >
> > > > > >
> > > > >
> > > > > >                           available)
> > > > >
> > > > > >
> > > > >
> > > > > >                           Valid values: 1:"cma", 4:"emulated
", 3:"none"
> > > > >
> > > > > >
> > > > >
> > > > > >            MCA btl vader: parameter "btl_vader_backing_
directory"
> > > > >
> > > > > > (current
> > > > >
> > > > > >
> > > > >
> > > > > >                           value: "/dev/shm", data source:
> > > > > > default,
> > > > >
> > > > > > level: 3
> > > > >
> > > > > >
> > > > >
> > > > > >                           user/all, type: string)
> > > > >
> > > > > >
> > > > >
> > > > > >                           Directory to place backing files 
for 
> > > > > > shared
> > > > >
> > > > > > memory
> > > > >
> > > > > >
> > > > >
> > > > > >                           communication. This directory 
should 
> > > > > > be on a
> > > > >
> > > > > > local
> > > > >
> > > > > >
> > > > >
> > > > > >                           filesystem such as /tmp or /dev/
shm (default:
> > > > >
> > > > > >
> > > > >
> > > > > >                           (linux) /dev/shm, (others) session
> > > > >
> > > > > > directory)
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > >>> What if you run with only 2 MPI ranks?
> > > > >
> > > > > >
> > > > >
> > > > > > >>> do you observe similar performance differences between 
Open MPI 3.1 and 4.1?
> > > > >
> > > > > >
> > > > >
> > > > > > When I run only 2 MPI ranks, the performance regression is 
not significant.
> > > > >
> > > > > >
> > > > >
> > > > > > openMPI3.1.1 gives MFLOPS: 11122
> > > > >
> > > > > >
> > > > >
> > > > > > openMPI4.1.0 gives MFLOPS: 11041
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > With Regards,
> > > > >
> > > > > >
> > > > >
> > > > > > S. Biplab Raut
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
> > > > >
> > > > > > Sent: Friday, March 12, 2021 7:07 PM
> > > > >
> > > > > > To: Raut, S Biplab <biplab.r...@amd.com>
> > > > >
> > > > > > Subject: Re: [OMPI users] Stable and performant openMPI 
version for Ubuntu20.04 ?
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > [CAUTION: External Email]
> > > > >
> > > > > >
> > > > >
> > > > > > Can you please post the output of
> > > > >
> > > > > >
> > > > >
> > > > > > ompi_info --param btl vader --level 3
> > > > >
> > > > > >
> > > > >
> > > > > > with both Open MPI 3.1 and 4.1?
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > What if you run with only 2 MPI ranks?
> > > > >
> > > > > >
> > > > >
> > > > > > do you observe similar performance differences between Open 
MPI 3.1 and 4.1?
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > Cheers,
> > > > >
> > > > > >
> > > > >
> > > > > > Gilles
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > On Fri, Mar 12, 2021 at 6:31 PM Raut, S Biplab <Biplab.Raut@
amd.com> wrote:
> > > > >
> > > > > >
> > > > >
> > > > > > Dear Gilles,
> > > > >
> > > > > >
> > > > >
> > > > > >                     Thank you for the reply.
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > >>> when running
> > > > >
> > > > > >
> > > > >
> > > > > > >>> mpirun --map-by core -rank-by core --bind-to core --mca 
> > > > > > >>> pml
> > > > > > >>> ob1
> > > > >
> > > > > > >>> --mca btl vader,self ./mpi-bench ic1000000
> > > > >
> > > > > >
> > > > >
> > > > > > >>> I go similar flops with Open MPI 3.1.1, 3.1.6, 4.1.0 and
> > > > > > >>> 4.1.1rc1
> > > > >
> > > > > > >>> on my system
> > > > >
> > > > > >
> > > > >
> > > > > > >>> If you are using a different command line, please let me 
> > > > > > >>> know and
> > > > >
> > > > > > >>> I will give it a try
> > > > >
> > > > > >
> > > > >
> > > > > > Although the command line that I use is different, but I ran 
with the above command line as used by you.
> > > > >
> > > > > >
> > > > >
> > > > > > I still find that openMPI4.1.0 is poor as compared to 
openMPI3.1.1. Please check the details below. I have also provided my 
system details if it matters.
> > > > >
> > > > > >
> > > > >
> > > > > > openMPI3.1.1
> > > > >
> > > > > >
> > > > >
> > > > > > -------------------
> > > > >
> > > > > >
> > > > >
> > > > > > $ mpirun --map-by core -rank-by core --bind-to core --mca 
pml
> > > > > > ob1
> > > > >
> > > > > > --mca btl vader,self ./mpi-bench ic1000000
> > > > >
> > > > > >
> > > > >
> > > > > > Problem: ic1000000, setup: 552.20 ms, time: 1.33 ms, ``
mflops'':
> > > > > > 75143
> > > > >
> > > > > >
> > > > >
> > > > > > $ ompi_info --all|grep 'command line'
> > > > >
> > > > > >
> > > > >
> > > > > >   Configure command line: '--prefix=/home/server/ompi3/gcc' 
'--enable-mpi-fortran' '--enable-mpi-cxx' '--enable-shared=yes' '--
enable-static=yes' '--enable-mpi1-compatibility'
> > > > >
> > > > > >
> > > > >
> > > > > >                           User-specified command line 
> > > > > > parameters
> > > > >
> > > > > > passed to ROMIO's configure script
> > > > >
> > > > > >
> > > > >
> > > > > >                           Complete set of command line 
> > > > > > parameters
> > > > >
> > > > > > passed to ROMIO's configure script
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > openMPI4.1.0
> > > > >
> > > > > >
> > > > >
> > > > > > -------------------
> > > > >
> > > > > >
> > > > >
> > > > > > $ mpirun --map-by core -rank-by core --bind-to core --mca 
pml
> > > > > > ob1
> > > > >
> > > > > > --mca btl vader,self ./mpi-bench ic1000000
> > > > >
> > > > > >
> > > > >
> > > > > > Problem: ic1000000, setup: 557.12 ms, time: 1.75 ms, ``
mflops'':
> > > > > > 57029
> > > > >
> > > > > >
> > > > >
> > > > > > $ ompi_info --all|grep 'command line'
> > > > >
> > > > > >
> > > > >
> > > > > >   Configure command line: '--prefix=/home/server/ompi4_plain
' '--enable-mpi-fortran' '--enable-mpi-cxx' '--enable-shared=yes' '--
enable-static=yes' '--enable-mpi1-compatibility'
> > > > >
> > > > > >
> > > > >
> > > > > >                           User-specified command line 
> > > > > > parameters
> > > > >
> > > > > > passed to ROMIO's configure script
> > > > >
> > > > > >
> > > > >
> > > > > >                           Complete set of command line 
> > > > > > parameters
> > > > >
> > > > > > passed to ROMIO's configure script
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > openMPI4.1.0 + xpmem
> > > > >
> > > > > >
> > > > >
> > > > > > --------------------------------
> > > > >
> > > > > >
> > > > >
> > > > > > $ mpirun --map-by core -rank-by core --bind-to core --mca 
pml
> > > > > > ob1
> > > > >
> > > > > > --mca btl vader,self ./mpi-bench ic1000000
> > > > >
> > > > > >
> > > > >
> > > > > > ------------------------------------------------------------
--
> > > > > > --
> > > > > > ------
> > > > >
> > > > > > ----
> > > > >
> > > > > >
> > > > >
> > > > > > WARNING: Could not generate an xpmem segment id for this 
process'
> > > > >
> > > > > >
> > > > >
> > > > > > address space.
> > > > >
> > > > > >
> > > > >
> > > > > > The vader shared memory BTL will fall back on another 
> > > > > > single-copy
> > > > >
> > > > > >
> > > > >
> > > > > > mechanism if one is available. This may result in lower 
performance.
> > > > >
> > > > > >
> > > > >
> > > > > >   Local host: lib-daytonax-03
> > > > >
> > > > > >
> > > > >
> > > > > >   Error code: 2 (No such file or directory)
> > > > >
> > > > > >
> > > > >
> > > > > > ------------------------------------------------------------
--
> > > > > > --
> > > > > > ------
> > > > >
> > > > > > ----
> > > > >
> > > > > >
> > > > >
> > > > > > Problem: ic1000000, setup: 559.55 ms, time: 1.77 ms, ``
mflops'':
> > > > > > 56280
> > > > >
> > > > > >
> > > > >
> > > > > > $ ompi_info --all|grep 'command line'
> > > > >
> > > > > >
> > > > >
> > > > > >   Configure command line: '--prefix=/home/server/ompi4_xmem' 
'--with-xpmem=/opt/xpmm' '--enable-mpi-fortran' '--enable-mpi-cxx' '--
enable-shared=yes' '--enable-static=yes' '--enable-mpi1-compatibility'
> > > > >
> > > > > >
> > > > >
> > > > > >                           User-specified command line 
> > > > > > parameters
> > > > >
> > > > > > passed to ROMIO's configure script
> > > > >
> > > > > >
> > > > >
> > > > > >                           Complete set of command line 
> > > > > > parameters
> > > > >
> > > > > > passed to ROMIO's configure script
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > Other System Config
> > > > >
> > > > > >
> > > > >
> > > > > > ----------------------------
> > > > >
> > > > > > -        $ cat /etc/os-release
> > > > >
> > > > > >
> > > > >
> > > > > > NAME="Ubuntu"
> > > > >
> > > > > >
> > > > >
> > > > > > VERSION="20.04 LTS (Focal Fossa)"
> > > > >
> > > > > >
> > > > >
> > > > > > $ gcc -v
> > > > >
> > > > > >
> > > > >
> > > > > > cc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)
> > > > >
> > > > > >
> > > > >
> > > > > > DRAM:- 1TB DDR4-3200 MT/s RDIMM memory
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > The recommended command line to run would be as below:-
> > > > >
> > > > > >
> > > > >
> > > > > > mpirun --map-by core --rank-by core --bind-to core --mca pml 
> > > > > > ob1 --mca
> > > > >
> > > > > > btl vader,self ./mpi-bench -owisdom -opatient -r1000 -s
> > > > > > icf1000000
> > > > >
> > > > > >
> > > > >
> > > > > > (Here, -opatient would allow the use of best kernel/
algorithm 
> > > > > > plan,
> > > > >
> > > > > >
> > > > >
> > > > > >             -r1000 would run the test for 1000 iterations to 
> > > > > > avoid
> > > > >
> > > > > > run-to-run variations,
> > > > >
> > > > > >
> > > > >
> > > > > >             -owisdom would take off the first-time setup 
> > > > > > overhead/time
> > > > >
> > > > > > when executing the "mpirun command line" next time)
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > Please suggest me if any other details needed for you to 
analyze this performance regression?
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > With Regards,
> > > > >
> > > > > >
> > > > >
> > > > > > S. Biplab Raut
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
> > > > >
> > > > > > Sent: Friday, March 12, 2021 12:46 PM
> > > > >
> > > > > > To: Raut, S Biplab <biplab.r...@amd.com>
> > > > >
> > > > > > Subject: Re: [OMPI users] Stable and performant openMPI 
version for Ubuntu20.04 ?
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > [CAUTION: External Email]
> > > > >
> > > > > >
> > > > >
> > > > > > when running
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > mpirun --map-by core -rank-by core --bind-to core --mca pml 
> > > > > > ob1 --mca
> > > > >
> > > > > > btl vader,self ./mpi-bench ic1000000
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > I go similar flops with Open MPI 3.1.1, 3.1.6, 4.1.0 and
> > > > > > 4.1.1rc1 on
> > > > >
> > > > > > my system
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > If you are using a different command line, please let me 
know 
> > > > > > and I
> > > > >
> > > > > > will give it a try
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > Cheers,
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > Gilles
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > On Fri, Mar 12, 2021 at 3:20 PM Raut, S Biplab <Biplab.Raut@
amd.com> wrote:
> > > > >
> > > > > >
> > > > >
> > > > > > Reposting here without the logs - it seems there is a 
message size limit of 150KB and so could not attach the logs.
> > > > >
> > > > > >
> > > > >
> > > > > > (Request the moderator to approve the original mail that has
> > > > >
> > > > > > attachment of compressed logs)
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > My main concern in moving from ompi3.1.1 to ompi4.1.0 - Why 
does ompi4.1.0 perform poorly as compared to opmi3.1.1 for some test 
sizes???
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > I ran "FFTW MPI bench binary" in verbose mode "10" (as 
suggested by Gilles) for below three cases and confirmed that btl/vader 
is used by default.
> > > > >
> > > > > >
> > > > >
> > > > > > FFTW MPI test for a 1D problem size (1000000) is run on a 
> > > > > > single-node
> > > > >
> > > > > > as below:-
> > > > >
> > > > > >
> > > > >
> > > > > > mpirun --map-by core --rank-by core --bind-to core -np 128
> > > > >
> > > > > > <fftw/mpi/bench program binary> <program binary options for 
> > > > > > problem
> > > > >
> > > > > > size 1000000 >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > The three test cases are described below :- Test run with 
openMPI3.1.1 performs best.
> > > > >
> > > > > >
> > > > >
> > > > > > Test run on Ubuntu20.04 and stock openMPI3.1.1 : gives 
mflops:
> > > > > > 76978
> > > > >
> > > > > > Test run on Ubuntu20.04 and stock openMPI4.1.1 : gives 
mflops:
> > > > > > 56205
> > > > >
> > > > > > Test run on Ubuntu20.04 and openMPI4.1.1 configured with 
xpmem :
> > > > > > gives
> > > > >
> > > > > > mflops: 56411
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > Please check more details in the below mail chain.
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > P.S:
> > > > >
> > > > > >
> > > > >
> > > > > > FFTW MPI bench test binary can be compiled from sources 
> > > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Famd%2Famd-fftw&amp;data=04%7C01%7CBiplab.Raut%40amd.com%7C6b277b24afa04650c86c08d8f4ba5dc7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637528428572315404%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ZEEPzm3zODJ6fRLZUzidkhb38OuqbxKb%2FeKk%2FU6Fddk%3D&amp;reserved=0
 OR 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FFFTW%2Ffftw3&amp;data=04%7C01%7CBiplab.Raut%40amd.com%7C6b277b24afa04650c86c08d8f4ba5dc7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637528428572325397%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1F8i7XTpnvC6twV7lf3dCLfsKyYFAGP7A0tcnUWPSZ8%3D&amp;reserved=0
 .
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > With Regards,
> > > > >
> > > > > >
> > > > >
> > > > > > S. Biplab Raut
> > > > > > >
> 


Reply via email to