[AMD Official Use Only - General]

Hi Gilles,

Yes, I am using xpmem, but getting the below issue.

https://github.com/open-mpi/ompi/issues/11463

--Arun


From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
Sent: Monday, March 6, 2023 2:08 PM
To: Chandran, Arun <arun.chand...@amd.com>
Subject: Re: [OMPI users] What is the best choice of pml and btl for intranode 
communication

Caution: This message originated from an External Source. Use proper caution 
when opening attachments, clicking links, or responding.

If you have not done it already, you can install xpmem or knees in order to 
boost btl/vader performances

you can

ompi_info --all | grep single_copy_mechanism

to check which mechanism (if any) is used.


Cheers,

Gilles

On Mon, Mar 6, 2023 at 5:33 PM Chandran, Arun 
<arun.chand...@amd.com<mailto:arun.chand...@amd.com>> wrote:

[Public]

Hi Gilles,

Thanks very much for the information.

I was looking for the best pml + btl combination for a standalone intra node 
with high task count (>= 192) with no HPC-class networking installed.

Just now realized that I can’t use pml ucx for such cases as it is unable find 
IB and fails.

perf_benchmark $ mpirun -np 32 --map-by core --bind-to core ./perf  --mca pml 
ucx
--------------------------------------------------------------------------
No components were able to be opened in the pml framework.

This typically means that either no components of this type were
installed, or none of the installed components can be loaded.
Sometimes this means that shared libraries required by these
components are unable to be found/loaded.

  Host:      lib-ssp-04
  Framework: pml
--------------------------------------------------------------------------
[lib-ssp-04:753542] PML ucx cannot be selected
[lib-ssp-04:753531] PML ucx cannot be selected
[lib-ssp-04:753541] PML ucx cannot be selected
[lib-ssp-04:753539] PML ucx cannot be selected
[lib-ssp-04:753545] PML ucx cannot be selected
[lib-ssp-04:753547] PML ucx cannot be selected
[lib-ssp-04:753572] PML ucx cannot be selected
[lib-ssp-04:753538] PML ucx cannot be selected
[lib-ssp-04:753530] PML ucx cannot be selected
[lib-ssp-04:753537] PML ucx cannot be selected
[lib-ssp-04:753546] PML ucx cannot be selected
[lib-ssp-04:753544] PML ucx cannot be selected
[lib-ssp-04:753570] PML ucx cannot be selected
[lib-ssp-04:753567] PML ucx cannot be selected
[lib-ssp-04:753534] PML ucx cannot be selected
[lib-ssp-04:753592] PML ucx cannot be selected
[lib-ssp-04:753529] PML ucx cannot be selected
<snip>

That means my only choice is pml/ob1 + btl/vader.

--Arun

From: users 
<users-boun...@lists.open-mpi.org<mailto:users-boun...@lists.open-mpi.org>> On 
Behalf Of Gilles Gouaillardet via users
Sent: Monday, March 6, 2023 12:56 PM
To: Open MPI Users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>>
Cc: Gilles Gouaillardet 
<gilles.gouaillar...@gmail.com<mailto:gilles.gouaillar...@gmail.com>>
Subject: Re: [OMPI users] What is the best choice of pml and btl for intranode 
communication

Caution: This message originated from an External Source. Use proper caution 
when opening attachments, clicking links, or responding.

Arun,

First Open MPI selects a pml for **all** the MPI tasks (for example, pml/ucx or 
pml/ob1)

Then, if pml/ob1 ends up being selected, a btl component (e.g. btl/uct, 
btl/vader) is used for each pair of MPI tasks
(tasks on the same node will use btl/vader, tasks on different nodes will use 
btl/uct)

Note that if UCX is available, pml/ucx takes the highest priority, so no btl is 
involved
(in your case, if means intra-node communications will be handled by UCX and 
not btl/vader).
You can force ob1 and try different combinations of btl with
mpirun --mca pml ob1 --mca btl self,<btl1>,<btl2> ...

I expect pml/ucx is faster than pml/ob1 with btl/uct for inter node 
communications.

I have not benchmarked Open MPI for a while and it is possible btl/vader 
outperforms pml/ucx for intra nodes communications,
so if you run on a small number of Infiniband interconnected nodes with a large 
number of tasks per node, you might be able
to get the best performances by forcing pml/ob1.

Bottom line, I think it is best for you to benchmark your application and pick 
the combination that leads to the best performances,
and you are more than welcome to share your conclusions.

Cheers,

Gilles


On Mon, Mar 6, 2023 at 3:12 PM Chandran, Arun via users 
<users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote:
[Public]

Hi Folks,

I can run benchmarks and find the pml+btl (ob1, ucx, uct, vader, etc)  
combination that gives the best performance,
but I wanted to hear from the community about what is generally used in 
"__high_core_count_intra_node_" cases before jumping into conclusions.

As I am a newcomer to openMPI I don't want to end up using a combination only 
because it fared better in a benchmark (overfitting?)

Or the choice of pml+btl for the 'intranode' case is not so important as 
openmpi is mainly used in 'internode' and the 'networking-equipment' decides 
the pml+btl? (UCX for IB)

--Arun

-----Original Message-----
From: users 
<users-boun...@lists.open-mpi.org<mailto:users-boun...@lists.open-mpi.org>> On 
Behalf Of Chandran, Arun via users
Sent: Thursday, March 2, 2023 4:01 PM
To: users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>
Cc: Chandran, Arun <arun.chand...@amd.com<mailto:arun.chand...@amd.com>>
Subject: [OMPI users] What is the best choice of pml and btl for intranode 
communication

Hi Folks,

As the number of cores in a socket is keep on increasing, the right pml,btl 
(ucx, ob1, uct, vader, etc) that gives the best performance in "intra-node" 
scenario is important.

For openmpi-4.1.4, which pml, btl combination is the best for intra-node 
communication in the case of higher core count scenario? (p-to-p as well as 
coll) and why?
Does the answer for the above question holds good for the upcoming ompi5 
release?

--Arun

Reply via email to