Dear Gilles,
                    I am using GCC (v9.3.0) as provided by Ubuntu 20.04.
As I said only few test cases were in regression with the earlier patch, but 
this new patch resolves all of them.

With Regards,
S. Biplab Raut

-----Original Message-----
From: gil...@rist.or.jp <gil...@rist.or.jp> 
Sent: Friday, April 9, 2021 5:45 AM
To: Open MPI Users <users@lists.open-mpi.org>
Cc: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>; Raut, S Biplab 
<biplab.r...@amd.com>
Subject: Re: [OMPI users] Stable and performant openMPI version for Ubuntu20.04 
?

[CAUTION: External Email]

Are you using gcc provided by Ubuntu 20.04?
if not which compiler (vendor and version) are you using?

My (light) understanding is that this patch should not impact performances, so 
I am not sure whether the performance being back is something I do not 
understand,  or the side effect of a compiler bug.

Anyway, I issued 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopen-mpi%2Fompi%2Fpull%2F8789&amp;data=04%7C01%7CBiplab.Raut%40amd.com%7C1c4ece784e9046362baa08d8faec8ebf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637535241229925553%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=BDCXkCjWmem3ajzvqY%2FbHygOytd5GzUxRz8ZuEofLLk%3D&amp;reserved=0
 and asked for a review.

Cheers,

Gilles

----- Original Message -----
> Dear Gilles,
>                     As per your suggestion, I tried the inline patch
as discussed in 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopen-mpi%2Fompi%2Fpull%2F8622%23issuecomment-800776864&amp;data=04%7C01%7CBiplab.Raut%40amd.com%7C1c4ece784e9046362baa08d8faec8ebf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637535241229925553%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=M2zHaN2MQ1y0XpXjGzorrJ7YvTTV0RhnLjtP66GTd3s%3D&amp;reserved=0
 .
>
> This has fixed the regression completely for the remaining test cases
in FFTW MPI in-built test bench - which was persisting even after using the git 
patch 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatch-diff.githubusercontent.com%2Fraw%2Fopen-mpi%2Fompi%2Fpull%2F8623.patch&amp;data=04%7C01%7CBiplab.Raut%40amd.com%7C1c4ece784e9046362baa08d8faec8ebf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637535241229925553%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=jheNkQa3d177uME1Qtxc9y4GTsOKyaAOgyC9%2Bqu0hcU%3D&amp;reserved=0
 as merged by you.
> So, it seems there is a performance difference between asm volatile("":
 : :"memory"); and __atomic_thread_fence (__ATOMIC_ACQUIRE) on x86_64.
>
> I would request you to please make this change and merge it to
respective openMPI branches - please intimate if possible whenever that takes 
place.
> I also request you to plan for an early 4.1.1rc2 release at least by
June 2021.
>
> With Regards,
> S. Biplab Raut
>
> -----Original Message-----
> From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
> Sent: Thursday, April 1, 2021 8:31 AM
> To: Raut, S Biplab <biplab.r...@amd.com>
> Subject: Re: [OMPI users] Stable and performant openMPI version for
Ubuntu20.04 ?
>
> [CAUTION: External Email]
>
> I really had no time to investigate this.
>
> A quick test is to apply the patch in the inline comment at
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> ub.com%2Fopen-mpi%2Fompi%2Fpull%2F8622%23issuecomment-800776864&amp;da
> ta=04%7C01%7CBiplab.Raut%40amd.com%7C1c4ece784e9046362baa08d8faec8ebf%
> 7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637535241229925553%7CUnkn
> own%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi
> LCJXVCI6Mn0%3D%7C1000&amp;sdata=M2zHaN2MQ1y0XpXjGzorrJ7YvTTV0RhnLjtP66
> GTd3s%3D&amp;reserved=0
 and see whether it helps.
>
> If not, I would recommend you try Open MPI 3.1.6 (after manually
applying 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopen-mpi%2Fompi%2Fpull%2F8624.patch&amp;data=04%7C01%7CBiplab.Raut%40amd.com%7C1c4ece784e9046362baa08d8faec8ebf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637535241229925553%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=FxcMmtniXES2%2BWaRijQf2T5GQFupDqOv5ytZvjg2NLo%3D&amp;reserved=0)
 and see whether there is a performance regression between 3.1.1 and (
patched) 3.1.6
>
> Cheers,
>
> Gilles
>
> On Thu, Apr 1, 2021 at 11:25 AM Raut, S Biplab <biplab.r...@amd.com>
wrote:
> >
> > Dear Gilles,
> >                      Did you get a chance to look into my below mail
content?
> > I find the regression is not completely fixed.
> >
> > With Regards,
> > S. Biplab Raut
> >
> > -----Original Message-----
> > From: Raut, S Biplab
> > Sent: Wednesday, March 24, 2021 11:32 PM
> > To: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
> > Subject: RE: [OMPI users] Stable and performant openMPI version for
Ubuntu20.04 ?
> >
> > Dear Gilles,
> >                     After applying the below patch, I thoroughly
tested various test cases of FFTW using its in-built benchmark test program.
> > Many of the test cases, that showed regression previously as
compared to openMPI3.1.1, have now improved with positive gains.
> > However, there are still few test cases where the performance is
lower than openMPI3.1.1.
> > Are there more performance issues in openMPI4.x that need to be
discovered?
> >
> > Please check the below details.
> >
> > 1) For problem size 1024x1024x512 :-
> >      $   mpirun --map-by core --rank-by core --bind-to core  ./fftw/
mpi/mpi-bench -opatient -r500 -s dcif1024x1024x512
> >      openMPI3.3.1_stock performance -> 147 MFLOPS
> >      openMPI4.1.0_stock performance -> 137 MFLOPS
> >      openMPI4.1.0_patch performance -> 137 MFLOPS
> > 2) For problem size 512x512x512 :-
> >      $   mpirun --map-by core --rank-by core --bind-to core  ./fftw/
mpi/mpi-bench -opatient -r500 -s dcif512x512x512
> >      openMPI3.3.1_stock performance -> 153  MFLOPS
> >      openMPI4.1.0_stock performance -> 144 MFLOPS
> >      openMPI4.1.0_patch performance -> 147 MFLOPS
> >
> > With Regards,
> > S. Biplab Rsut
> >
> > -----Original Message-----
> > From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
> > Sent: Wednesday, March 17, 2021 11:14 AM
> > To: Raut, S Biplab <biplab.r...@amd.com>
> > Subject: Re: [OMPI users] Stable and performant openMPI version for
Ubuntu20.04 ?
> >
> > [CAUTION: External Email]
> >
> > The patch has been merged into the v4.1.x release branch, but 4.1.
1rc2 has not been yet released.
> > Your best bet is to download and apply the patch at 
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > th

> > ub.com%2Fopen-mpi%2Fompi%2Fpull%2F8623.patch&amp;data=04%7C01%
7CBiplab
> > .Raut%40amd.com%7C6b277b24afa04650c86c08d8f4ba5dc7%
7C3dd8961fe4884e608
> > e11a82d994e183d%7C0%7C0%7C637528428572315404%7CUnknown%
7CTWFpbGZsb3d8e
> > yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
7C10
> > 00&amp;sdata=OKGqhhQM68vPhuADfVdqOlHmY0ZHGtUdM%2B1WeeJ9WoY%3D&amp;
rese
> > rved=0 (since this does not involve any configury stuff, the process 
> > should be painless)
> >
> > Cheers,
> >
> > Gilles
> >
> > On Wed, Mar 17, 2021 at 2:31 PM Raut, S Biplab <biplab.r...@amd.com>
wrote:
> > >
> > > Dear Gilles,
> > >                      Thank you for your support and quick fix for
this issue.
> > > Could you tell me if the fix is finally merged and how do I get
the RC version of this code (v4.1) ?
> > > Please point me to exact link, it will be helpful (since it will
be used in production servers).
> > >
> > > With Regards,
> > > S. Biplab Raut
> > >
> > > -----Original Message-----
> > > From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
> > > Sent: Sunday, March 14, 2021 4:18 PM
> > > To: Raut, S Biplab <biplab.r...@amd.com>
> > > Subject: Re: [OMPI users] Stable and performant openMPI version
for Ubuntu20.04 ?
> > >
> > > [CAUTION: External Email]
> > >
> > > This is something you can/have to do by yourself:
> > > log into github, open the issue and click the Subscribe button in 
> > > Notifications
> > >
> > > Cheers,
> > >
> > > Gilles
> > >
> > > On Sun, Mar 14, 2021 at 7:30 PM Raut, S Biplab <Biplab.Raut@amd.
com> wrote:
> > > >
> > > > Thank you very much for your support.
> > > > Can you please add me to this issue/ticket as a watcher/stake-
holder?
> > > >
> > > > With Regards,
> > > > S. Biplab Raut
> > > >
> > > > -----Original Message-----
> > > > From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
> > > > Sent: Sunday, March 14, 2021 3:23 PM
> > > > To: Raut, S Biplab <biplab.r...@amd.com>
> > > > Subject: Re: [OMPI users] Stable and performant openMPI version
for Ubuntu20.04 ?
> > > >
> > > > [CAUTION: External Email]
> > > >
> > > > Glad too we are finally on the same page!
> > > >
> > > > I filled an issue at
> > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%
> > > > 2F

> > > > gi
> > > > thub.com%2Fopen-mpi%2Fompi%2Fissues%2F8603&amp;data=04%7C01%
7CBipl
> > > > ab
> > > > .Raut%40amd.com%7C4bca16633f1b488e486008d8e907ac24%
7C3dd8961fe4884
> > > > e6
> > > > 08e11a82d994e183d%7C0%7C0%7C637515566455681853%7CUnknown%
7CTWFpbGZ
> > > > sb
> > > > 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6M
n0
> > > > %3
> > > > D%7C1000&amp;sdata=khoOPlLF2xbwhcUQhmk%2B0IBvuC4J5OyoBkwepijKJc0
> > > > %
3
> > > > D&
> > > > amp;reserved=0,
> > > > let's follow-up here from now
> > > >
> > > > Cheers,
> > > >
> > > > Gilles
> > > >
> > > > On Sun, Mar 14, 2021 at 5:38 PM Raut, S Biplab <Biplab.Raut@amd.
com> wrote:
> > > > >
> > > > > Dear Gilles,
> > > > >
> > > > >                      Reposting your comments along with my
replies in the mailing-list for everybody to view/react.
> > > > >
> > > > >
> > > > >
> > > > > I am seeing some important performance degradation between
Open
> > > > > MPI
> > > > >
> > > > > 3.1.1 and the top of the v3.1.x branch
> > > > >
> > > > > when running on a large number of cores.
> > > > >
> > > > > Same performance between 4.1.0 and the top of v3.1.x
> > > > >
> > > > > I am now running git bisect to find out when this started
happening.
> > > > >
> > > > > I am finally feeling relieved and happy that you could
reproduce and acknowledge this regression !!
> > > > >
> > > > > Do I need to file any bug officially anywhere?
> > > > >
> > > > >
> > > > >
> > > > > IIRC, I noted an xpmem error in your logs (that means xpmem is
not used).
> > > > >
> > > > > The root cause could be that the xpmem kernel module is not 
> > > > > loaded, of the permissions on the device are incorrect As
Nathan
> > > > > pointed out, xpmem is likely to get the best performances, so 
> > > > > while I am running git bisect, I do invite you to fix your
xpmem
> > > > > issue and see how this impacts performances
> > > > >
> > > > > Sure, I will try to fix the xpmem error and check the impact
on the performance.
> > > > >
> > > > >
> > > > >
> > > > > With Regards,
> > > > >
> > > > > S. Biplab Raut
> > > > >
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
> > > > > Sent: Sunday, March 14, 2021 8:45 AM
> > > > > To: Raut, S Biplab <biplab.r...@amd.com>
> > > > > Subject: Re: [OMPI users] Stable and performant openMPI
version for Ubuntu20.04 ?
> > > > >
> > > > >
> > > > >
> > > > > [CAUTION: External Email]
> > > > >
> > > > >
> > > > >
> > > > > I am seeing some important performance degradation between
Open
> > > > > MPI
> > > > >
> > > > > 3.1.1 and the top of the v3.1.x branch
> > > > >
> > > > > when running on a large number of cores.
> > > > >
> > > > > Same performance between 4.1.0 and the top of v3.1.x
> > > > >
> > > > >
> > > > >
> > > > > I am now running git bisect to find out when this started
happening.
> > > > >
> > > > >
> > > > >
> > > > > IIRC, I noted an xpmem error in your logs (that means xpmem is
not used).
> > > > >
> > > > > The root cause could be that the xpmem kernel module is not 
> > > > > loaded, of the permissions on the device are incorrect As
Nathan
> > > > > pointed out, xpmem is likely to get the best performances, so 
> > > > > while I am running git bisect, I do invite you to fix your
xpmem
> > > > > issue and see how this impacts performances
> > > > >
> > > > >
> > > > >
> > > > > Cheers,
> > > > >
> > > > >
> > > > >
> > > > > Gilles
> > > > >
> > > > >
> > > > >
> > > > > On Sat, Mar 13, 2021 at 12:08 AM Raut, S Biplab <Biplab.Raut@
amd.com> wrote:
> > > > >
> > > > > >
> > > > >
> > > > > > Dear Gilles,
> > > > >
> > > > > >
> > > > >
> > > > > >                     Please check my replies inline.
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > >>> Can you please post the output of
> > > > >
> > > > > >
> > > > >
> > > > > > >>> ompi_info --param btl vader --level 3
> > > > >
> > > > > >
> > > > >
> > > > > > >>> with both Open MPI 3.1 and 4.1?
> > > > >
> > > > > >
> > > > >
> > > > > > openMPI3.1.1
> > > > >
> > > > > >
> > > > >
> > > > > > ------------------
> > > > >
> > > > > >
> > > > >
> > > > > > $ ompi_info --param btl vader --level 3
> > > > >
> > > > > >
> > > > >
> > > > > >                  MCA btl: vader (MCA v2.1.0, API v3.0.0, 
> > > > > > Component
> > > > >
> > > > > > v3.1.1)
> > > > >
> > > > > >
> > > > >
> > > > > >            MCA btl vader:
> > > > >
> > > > > > ---------------------------------------------------
> > > > >
> > > > > >
> > > > >
> > > > > >            MCA btl vader: parameter "btl_vader_single_copy_
mechanism"
> > > > >
> > > > > >
> > > > >
> > > > > >                           (current value: "cma", data source:
 default, level:
> > > > >
> > > > > >
> > > > >
> > > > > >                           3 user/all, type: int)
> > > > >
> > > > > >
> > > > >
> > > > > >                           Single copy mechanism to use 
> > > > > > (defaults to
> > > > >
> > > > > > best
> > > > >
> > > > > >
> > > > >
> > > > > >                           available)
> > > > >
> > > > > >
> > > > >
> > > > > >                           Valid values: 1:"cma", 3:"none"
> > > > >
> > > > > >
> > > > >
> > > > > > openMPI4.1.0
> > > > >
> > > > > >
> > > > >
> > > > > > ------------------
> > > > >
> > > > > >
> > > > >
> > > > > > $ ompi_info --param btl vader --level 3
> > > > >
> > > > > >
> > > > >
> > > > > >                  MCA btl: vader (MCA v2.1.0, API v3.1.0, 
> > > > > > Component
> > > > >
> > > > > > v4.1.0)
> > > > >
> > > > > >
> > > > >
> > > > > >            MCA btl vader:
> > > > >
> > > > > > ---------------------------------------------------
> > > > >
> > > > > >
> > > > >
> > > > > >            MCA btl vader: parameter "btl_vader_single_copy_
mechanism"
> > > > >
> > > > > >
> > > > >
> > > > > >                           (current value: "cma", data source:
 default, level:
> > > > >
> > > > > >
> > > > >
> > > > > >                           3 user/all, type: int)
> > > > >
> > > > > >
> > > > >
> > > > > >                           Single copy mechanism to use 
> > > > > > (defaults to
> > > > >
> > > > > > best
> > > > >
> > > > > >
> > > > >
> > > > > >                           available)
> > > > >
> > > > > >
> > > > >
> > > > > >                           Valid values: 1:"cma", 4:"emulated
", 3:"none"
> > > > >
> > > > > >
> > > > >
> > > > > >            MCA btl vader: parameter "btl_vader_backing_
directory"
> > > > >
> > > > > > (current
> > > > >
> > > > > >
> > > > >
> > > > > >                           value: "/dev/shm", data source:
> > > > > > default,
> > > > >
> > > > > > level: 3
> > > > >
> > > > > >
> > > > >
> > > > > >                           user/all, type: string)
> > > > >
> > > > > >
> > > > >
> > > > > >                           Directory to place backing files
for
> > > > > > shared
> > > > >
> > > > > > memory
> > > > >
> > > > > >
> > > > >
> > > > > >                           communication. This directory
should
> > > > > > be on a
> > > > >
> > > > > > local
> > > > >
> > > > > >
> > > > >
> > > > > >                           filesystem such as /tmp or /dev/
shm (default:
> > > > >
> > > > > >
> > > > >
> > > > > >                           (linux) /dev/shm, (others) session
> > > > >
> > > > > > directory)
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > >>> What if you run with only 2 MPI ranks?
> > > > >
> > > > > >
> > > > >
> > > > > > >>> do you observe similar performance differences between
Open MPI 3.1 and 4.1?
> > > > >
> > > > > >
> > > > >
> > > > > > When I run only 2 MPI ranks, the performance regression is
not significant.
> > > > >
> > > > > >
> > > > >
> > > > > > openMPI3.1.1 gives MFLOPS: 11122
> > > > >
> > > > > >
> > > > >
> > > > > > openMPI4.1.0 gives MFLOPS: 11041
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > With Regards,
> > > > >
> > > > > >
> > > > >
> > > > > > S. Biplab Raut
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
> > > > >
> > > > > > Sent: Friday, March 12, 2021 7:07 PM
> > > > >
> > > > > > To: Raut, S Biplab <biplab.r...@amd.com>
> > > > >
> > > > > > Subject: Re: [OMPI users] Stable and performant openMPI
version for Ubuntu20.04 ?
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > [CAUTION: External Email]
> > > > >
> > > > > >
> > > > >
> > > > > > Can you please post the output of
> > > > >
> > > > > >
> > > > >
> > > > > > ompi_info --param btl vader --level 3
> > > > >
> > > > > >
> > > > >
> > > > > > with both Open MPI 3.1 and 4.1?
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > What if you run with only 2 MPI ranks?
> > > > >
> > > > > >
> > > > >
> > > > > > do you observe similar performance differences between Open
MPI 3.1 and 4.1?
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > Cheers,
> > > > >
> > > > > >
> > > > >
> > > > > > Gilles
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > On Fri, Mar 12, 2021 at 6:31 PM Raut, S Biplab <Biplab.Raut@
amd.com> wrote:
> > > > >
> > > > > >
> > > > >
> > > > > > Dear Gilles,
> > > > >
> > > > > >
> > > > >
> > > > > >                     Thank you for the reply.
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > >>> when running
> > > > >
> > > > > >
> > > > >
> > > > > > >>> mpirun --map-by core -rank-by core --bind-to core --mca 
> > > > > > >>> pml
> > > > > > >>> ob1
> > > > >
> > > > > > >>> --mca btl vader,self ./mpi-bench ic1000000
> > > > >
> > > > > >
> > > > >
> > > > > > >>> I go similar flops with Open MPI 3.1.1, 3.1.6, 4.1.0 and
> > > > > > >>> 4.1.1rc1
> > > > >
> > > > > > >>> on my system
> > > > >
> > > > > >
> > > > >
> > > > > > >>> If you are using a different command line, please let me 
> > > > > > >>> know and
> > > > >
> > > > > > >>> I will give it a try
> > > > >
> > > > > >
> > > > >
> > > > > > Although the command line that I use is different, but I ran
with the above command line as used by you.
> > > > >
> > > > > >
> > > > >
> > > > > > I still find that openMPI4.1.0 is poor as compared to
openMPI3.1.1. Please check the details below. I have also provided my system 
details if it matters.
> > > > >
> > > > > >
> > > > >
> > > > > > openMPI3.1.1
> > > > >
> > > > > >
> > > > >
> > > > > > -------------------
> > > > >
> > > > > >
> > > > >
> > > > > > $ mpirun --map-by core -rank-by core --bind-to core --mca
pml
> > > > > > ob1
> > > > >
> > > > > > --mca btl vader,self ./mpi-bench ic1000000
> > > > >
> > > > > >
> > > > >
> > > > > > Problem: ic1000000, setup: 552.20 ms, time: 1.33 ms, ``
mflops'':
> > > > > > 75143
> > > > >
> > > > > >
> > > > >
> > > > > > $ ompi_info --all|grep 'command line'
> > > > >
> > > > > >
> > > > >
> > > > > >   Configure command line: '--prefix=/home/server/ompi3/gcc'
'--enable-mpi-fortran' '--enable-mpi-cxx' '--enable-shared=yes' '-- 
enable-static=yes' '--enable-mpi1-compatibility'
> > > > >
> > > > > >
> > > > >
> > > > > >                           User-specified command line 
> > > > > > parameters
> > > > >
> > > > > > passed to ROMIO's configure script
> > > > >
> > > > > >
> > > > >
> > > > > >                           Complete set of command line 
> > > > > > parameters
> > > > >
> > > > > > passed to ROMIO's configure script
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > openMPI4.1.0
> > > > >
> > > > > >
> > > > >
> > > > > > -------------------
> > > > >
> > > > > >
> > > > >
> > > > > > $ mpirun --map-by core -rank-by core --bind-to core --mca
pml
> > > > > > ob1
> > > > >
> > > > > > --mca btl vader,self ./mpi-bench ic1000000
> > > > >
> > > > > >
> > > > >
> > > > > > Problem: ic1000000, setup: 557.12 ms, time: 1.75 ms, ``
mflops'':
> > > > > > 57029
> > > > >
> > > > > >
> > > > >
> > > > > > $ ompi_info --all|grep 'command line'
> > > > >
> > > > > >
> > > > >
> > > > > >   Configure command line: '--prefix=/home/server/ompi4_plain
' '--enable-mpi-fortran' '--enable-mpi-cxx' '--enable-shared=yes' '-- 
enable-static=yes' '--enable-mpi1-compatibility'
> > > > >
> > > > > >
> > > > >
> > > > > >                           User-specified command line 
> > > > > > parameters
> > > > >
> > > > > > passed to ROMIO's configure script
> > > > >
> > > > > >
> > > > >
> > > > > >                           Complete set of command line 
> > > > > > parameters
> > > > >
> > > > > > passed to ROMIO's configure script
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > openMPI4.1.0 + xpmem
> > > > >
> > > > > >
> > > > >
> > > > > > --------------------------------
> > > > >
> > > > > >
> > > > >
> > > > > > $ mpirun --map-by core -rank-by core --bind-to core --mca
pml
> > > > > > ob1
> > > > >
> > > > > > --mca btl vader,self ./mpi-bench ic1000000
> > > > >
> > > > > >
> > > > >
> > > > > > ------------------------------------------------------------
--
> > > > > > --
> > > > > > ------
> > > > >
> > > > > > ----
> > > > >
> > > > > >
> > > > >
> > > > > > WARNING: Could not generate an xpmem segment id for this
process'
> > > > >
> > > > > >
> > > > >
> > > > > > address space.
> > > > >
> > > > > >
> > > > >
> > > > > > The vader shared memory BTL will fall back on another 
> > > > > > single-copy
> > > > >
> > > > > >
> > > > >
> > > > > > mechanism if one is available. This may result in lower
performance.
> > > > >
> > > > > >
> > > > >
> > > > > >   Local host: lib-daytonax-03
> > > > >
> > > > > >
> > > > >
> > > > > >   Error code: 2 (No such file or directory)
> > > > >
> > > > > >
> > > > >
> > > > > > ------------------------------------------------------------
--
> > > > > > --
> > > > > > ------
> > > > >
> > > > > > ----
> > > > >
> > > > > >
> > > > >
> > > > > > Problem: ic1000000, setup: 559.55 ms, time: 1.77 ms, ``
mflops'':
> > > > > > 56280
> > > > >
> > > > > >
> > > > >
> > > > > > $ ompi_info --all|grep 'command line'
> > > > >
> > > > > >
> > > > >
> > > > > >   Configure command line: '--prefix=/home/server/ompi4_xmem'
'--with-xpmem=/opt/xpmm' '--enable-mpi-fortran' '--enable-mpi-cxx' '-- 
enable-shared=yes' '--enable-static=yes' '--enable-mpi1-compatibility'
> > > > >
> > > > > >
> > > > >
> > > > > >                           User-specified command line 
> > > > > > parameters
> > > > >
> > > > > > passed to ROMIO's configure script
> > > > >
> > > > > >
> > > > >
> > > > > >                           Complete set of command line 
> > > > > > parameters
> > > > >
> > > > > > passed to ROMIO's configure script
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > Other System Config
> > > > >
> > > > > >
> > > > >
> > > > > > ----------------------------
> > > > >
> > > > > > -        $ cat /etc/os-release
> > > > >
> > > > > >
> > > > >
> > > > > > NAME="Ubuntu"
> > > > >
> > > > > >
> > > > >
> > > > > > VERSION="20.04 LTS (Focal Fossa)"
> > > > >
> > > > > >
> > > > >
> > > > > > $ gcc -v
> > > > >
> > > > > >
> > > > >
> > > > > > cc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)
> > > > >
> > > > > >
> > > > >
> > > > > > DRAM:- 1TB DDR4-3200 MT/s RDIMM memory
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > The recommended command line to run would be as below:-
> > > > >
> > > > > >
> > > > >
> > > > > > mpirun --map-by core --rank-by core --bind-to core --mca pml
> > > > > > ob1 --mca
> > > > >
> > > > > > btl vader,self ./mpi-bench -owisdom -opatient -r1000 -s
> > > > > > icf1000000
> > > > >
> > > > > >
> > > > >
> > > > > > (Here, -opatient would allow the use of best kernel/
algorithm
> > > > > > plan,
> > > > >
> > > > > >
> > > > >
> > > > > >             -r1000 would run the test for 1000 iterations to 
> > > > > > avoid
> > > > >
> > > > > > run-to-run variations,
> > > > >
> > > > > >
> > > > >
> > > > > >             -owisdom would take off the first-time setup 
> > > > > > overhead/time
> > > > >
> > > > > > when executing the "mpirun command line" next time)
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > Please suggest me if any other details needed for you to
analyze this performance regression?
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > With Regards,
> > > > >
> > > > > >
> > > > >
> > > > > > S. Biplab Raut
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
> > > > >
> > > > > > Sent: Friday, March 12, 2021 12:46 PM
> > > > >
> > > > > > To: Raut, S Biplab <biplab.r...@amd.com>
> > > > >
> > > > > > Subject: Re: [OMPI users] Stable and performant openMPI
version for Ubuntu20.04 ?
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > [CAUTION: External Email]
> > > > >
> > > > > >
> > > > >
> > > > > > when running
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > mpirun --map-by core -rank-by core --bind-to core --mca pml
> > > > > > ob1 --mca
> > > > >
> > > > > > btl vader,self ./mpi-bench ic1000000
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > I go similar flops with Open MPI 3.1.1, 3.1.6, 4.1.0 and
> > > > > > 4.1.1rc1 on
> > > > >
> > > > > > my system
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > If you are using a different command line, please let me
know
> > > > > > and I
> > > > >
> > > > > > will give it a try
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > Cheers,
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > Gilles
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > On Fri, Mar 12, 2021 at 3:20 PM Raut, S Biplab <Biplab.Raut@
amd.com> wrote:
> > > > >
> > > > > >
> > > > >
> > > > > > Reposting here without the logs - it seems there is a
message size limit of 150KB and so could not attach the logs.
> > > > >
> > > > > >
> > > > >
> > > > > > (Request the moderator to approve the original mail that has
> > > > >
> > > > > > attachment of compressed logs)
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > My main concern in moving from ompi3.1.1 to ompi4.1.0 - Why
does ompi4.1.0 perform poorly as compared to opmi3.1.1 for some test sizes???
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > I ran "FFTW MPI bench binary" in verbose mode "10" (as
suggested by Gilles) for below three cases and confirmed that btl/vader is used 
by default.
> > > > >
> > > > > >
> > > > >
> > > > > > FFTW MPI test for a 1D problem size (1000000) is run on a 
> > > > > > single-node
> > > > >
> > > > > > as below:-
> > > > >
> > > > > >
> > > > >
> > > > > > mpirun --map-by core --rank-by core --bind-to core -np 128
> > > > >
> > > > > > <fftw/mpi/bench program binary> <program binary options for 
> > > > > > problem
> > > > >
> > > > > > size 1000000 >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > The three test cases are described below :- Test run with
openMPI3.1.1 performs best.
> > > > >
> > > > > >
> > > > >
> > > > > > Test run on Ubuntu20.04 and stock openMPI3.1.1 : gives
mflops:
> > > > > > 76978
> > > > >
> > > > > > Test run on Ubuntu20.04 and stock openMPI4.1.1 : gives
mflops:
> > > > > > 56205
> > > > >
> > > > > > Test run on Ubuntu20.04 and openMPI4.1.1 configured with
xpmem :
> > > > > > gives
> > > > >
> > > > > > mflops: 56411
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > Please check more details in the below mail chain.
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > P.S:
> > > > >
> > > > > >
> > > > >
> > > > > > FFTW MPI bench test binary can be compiled from sources 
> > > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A
> > > > > > %2F%2Fgithub.com%2Famd%2Famd-fftw&amp;data=04%7C01%7CBiplab.
> > > > > > Raut%40amd.com%7C1c4ece784e9046362baa08d8faec8ebf%7C3dd8961f
> > > > > > e4884e608e11a82d994e183d%7C0%7C0%7C637535241229935547%7CUnkn
> > > > > > own%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTi
> > > > > > I6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=kSJy4GhnFlVCJFcACa
> > > > > > OXrcAHF6X3O4%2B%2B2uC7OmHsh0o%3D&amp;reserved=0
 OR 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FFFTW%2Ffftw3&amp;data=04%7C01%7CBiplab.Raut%40amd.com%7C1c4ece784e9046362baa08d8faec8ebf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637535241229935547%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=yiOIHr1%2F1vY3vIx8f1pA3m%2BwsC4ufamkioMSXWXsOu0%3D&amp;reserved=0
 .
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > With Regards,
> > > > >
> > > > > >
> > > > >
> > > > > > S. Biplab Raut
> > > > > > >
>


Reply via email to