I'd suggest opening a ticket on the UCX repo itself. This looks to me like UCX 
not recognizing a Mellanox device, or at least not initializing it correctly.


> On Aug 11, 2021, at 8:21 AM, Ryan Novosielski <novos...@rutgers.edu> wrote:
> 
> Thanks. That /is/ one solution, and what I’ll do in the interim since this 
> has to work in at least some fashion, but I would actually like to use UCX if 
> OpenIB is going to be deprecated. How do I find out what’s actually wrong?
> 
> --
> #BlackLivesMatter
> ____
> || \\UTGERS,           
> |---------------------------*O*---------------------------
> ||_// the State        |         Ryan Novosielski - novos...@rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> ||  \\    of NJ        | Office of Advanced Research Computing - MSB C630, 
> Newark
>     `'
> 
>> On Jul 29, 2021, at 11:35 AM, Ralph Castain via users 
>> <users@lists.open-mpi.org> wrote:
>> 
>> So it _is_ UCX that is the problem! Try using OMPI_MCA_pml=ob1 instead
>> 
>>> On Jul 29, 2021, at 8:33 AM, Ryan Novosielski <novos...@rutgers.edu> wrote:
>>> 
>>> Thanks, Ralph. This /does/ change things, but not very much. I was not 
>>> under the impression that I needed to do that, since when I ran without 
>>> having built against UCX, it warned me about the openib method being 
>>> deprecated. By default, does OpenMPI not use either anymore, and I need to 
>>> specifically call for UCX? Seems strange.
>>> 
>>> Anyhow, I’ve got some variables defined still, in addition to your 
>>> suggestion, for verbosity:
>>> 
>>> [novosirj@amarel-test2 ~]$ env | grep ^OMPI
>>> OMPI_MCA_pml=ucx
>>> OMPI_MCA_opal_common_ucx_opal_mem_hooks=1
>>> OMPI_MCA_pml_ucx_verbose=100
>>> 
>>> Here goes:
>>> 
>>> [novosirj@amarel-test2 ~]$ srun -n 2 --mpi=pmi2 -p oarc  --reservation=UCX 
>>> ./mpihello-gcc-8-openmpi-4.0.6
>>> srun: job 13995650 queued and waiting for resources
>>> srun: job 13995650 has been allocated resources
>>> --------------------------------------------------------------------------
>>> WARNING: There was an error initializing an OpenFabrics device.
>>> 
>>> Local host:   gpu004
>>> Local device: mlx4_0
>>> --------------------------------------------------------------------------
>>> --------------------------------------------------------------------------
>>> WARNING: There was an error initializing an OpenFabrics device.
>>> 
>>> Local host:   gpu004
>>> Local device: mlx4_0
>>> --------------------------------------------------------------------------
>>> [gpu004.amarel.rutgers.edu:29823] 
>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:147 using 
>>> OPAL memory hooks as external events
>>> [gpu004.amarel.rutgers.edu:29824] 
>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:147 using 
>>> OPAL memory hooks as external events
>>> [gpu004.amarel.rutgers.edu:29823] 
>>> ../../../../../openmpi-4.0.6/ompi/mca/pml/ucx/pml_ucx.c:197 
>>> mca_pml_ucx_open: UCX version 1.5.2
>>> [gpu004.amarel.rutgers.edu:29824] 
>>> ../../../../../openmpi-4.0.6/ompi/mca/pml/ucx/pml_ucx.c:197 
>>> mca_pml_ucx_open: UCX version 1.5.2
>>> [gpu004.amarel.rutgers.edu:29823] 
>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>> self/self: did not match transport list
>>> [gpu004.amarel.rutgers.edu:29823] 
>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 tcp/eno1: 
>>> did not match transport list
>>> [gpu004.amarel.rutgers.edu:29823] 
>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 tcp/ib0: 
>>> did not match transport list
>>> [gpu004.amarel.rutgers.edu:29824] 
>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>> self/self: did not match transport list
>>> [gpu004.amarel.rutgers.edu:29823] 
>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>> rc/mlx4_0:1: did not match transport list
>>> [gpu004.amarel.rutgers.edu:29823] 
>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>> ud/mlx4_0:1: did not match transport list
>>> [gpu004.amarel.rutgers.edu:29823] 
>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 mm/sysv: 
>>> did not match transport list
>>> [gpu004.amarel.rutgers.edu:29823] 
>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 mm/posix: 
>>> did not match transport list
>>> [gpu004.amarel.rutgers.edu:29823] 
>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 cma/cma: 
>>> did not match transport list
>>> [gpu004.amarel.rutgers.edu:29823] 
>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:311 support 
>>> level is none
>>> [gpu004.amarel.rutgers.edu:29824] 
>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 tcp/eno1: 
>>> did not match transport list
>>> [gpu004.amarel.rutgers.edu:29824] 
>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 tcp/ib0: 
>>> did not match transport list
>>> [gpu004.amarel.rutgers.edu:29824] 
>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>> rc/mlx4_0:1: did not match transport list
>>> --------------------------------------------------------------------------
>>> No components were able to be opened in the pml framework.
>>> 
>>> This typically means that either no components of this type were
>>> installed, or none of the installed components can be loaded.
>>> Sometimes this means that shared libraries required by these
>>> components are unable to be found/loaded.
>>> 
>>> Host:      gpu004
>>> Framework: pml
>>> --------------------------------------------------------------------------
>>> [gpu004.amarel.rutgers.edu:29823] PML ucx cannot be selected
>>> [gpu004.amarel.rutgers.edu:29824] 
>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>> ud/mlx4_0:1: did not match transport list
>>> [gpu004.amarel.rutgers.edu:29824] 
>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 mm/sysv: 
>>> did not match transport list
>>> [gpu004.amarel.rutgers.edu:29824] 
>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 mm/posix: 
>>> did not match transport list
>>> [gpu004.amarel.rutgers.edu:29824] 
>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 cma/cma: 
>>> did not match transport list
>>> [gpu004.amarel.rutgers.edu:29824] 
>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:311 support 
>>> level is none
>>> --------------------------------------------------------------------------
>>> No components were able to be opened in the pml framework.
>>> 
>>> This typically means that either no components of this type were
>>> installed, or none of the installed components can be loaded.
>>> Sometimes this means that shared libraries required by these
>>> components are unable to be found/loaded.
>>> 
>>> Host:      gpu004
>>> Framework: pml
>>> --------------------------------------------------------------------------
>>> [gpu004.amarel.rutgers.edu:29824] PML ucx cannot be selected
>>> slurmstepd: error: *** STEP 13995650.0 ON gpu004 CANCELLED AT 
>>> 2021-07-29T11:31:19 ***
>>> srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
>>> srun: error: gpu004: tasks 0-1: Exited with exit code 1
>>> 
>>> --
>>> #BlackLivesMatter
>>> ____
>>> || \\UTGERS,         
>>> |---------------------------*O*---------------------------
>>> ||_// the State      |         Ryan Novosielski - novos...@rutgers.edu
>>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
>>> ||  \\    of NJ      | Office of Advanced Research Computing - MSB C630, 
>>> Newark
>>>   `'
>>> 
>>>> On Jul 29, 2021, at 8:34 AM, Ralph Castain via users 
>>>> <users@lists.open-mpi.org> wrote:
>>>> 
>>>> Ryan - I suspect what Sergey was trying to say was that you need to ensure 
>>>> OMPI doesn't try to use the OpenIB driver, or at least that it doesn't 
>>>> attempt to initialize it. Try adding
>>>> 
>>>> OMPI_MCA_pml=ucx
>>>> 
>>>> to your environment.
>>>> 
>>>> 
>>>>> On Jul 29, 2021, at 1:56 AM, Sergey Oblomov via users 
>>>>> <users@lists.open-mpi.org> wrote:
>>>>> 
>>>>> Hi
>>>>> 
>>>>> This issue arrives from BTL OpenIB, not related to UCX
>>>>> 
>>>>> From: users <users-boun...@lists.open-mpi.org> on behalf of Ryan 
>>>>> Novosielski via users <users@lists.open-mpi.org>
>>>>> Date: Thursday, 29 July 2021, 08:25
>>>>> To: users@lists.open-mpi.org <users@lists.open-mpi.org>
>>>>> Cc: Ryan Novosielski <novos...@rutgers.edu>
>>>>> Subject: [OMPI users] OpenMPI 4.0.6 w/GCC 8.5 on CentOS 7.9; "WARNING: 
>>>>> There was an error initializing an OpenFabrics device."
>>>>> 
>>>>> Hi there,
>>>>> 
>>>>> New to using UCX, as a result of having built OpenMPI without it and 
>>>>> running tests and getting warned. Installed UCX from the distribution:
>>>>> 
>>>>> [novosirj@amarel-test2 ~]$ rpm -qa ucx
>>>>> ucx-1.5.2-1.el7.x86_64
>>>>> 
>>>>> …and rebuilt OpenMPI. Built fine. However, I’m getting some pretty 
>>>>> unhelpful messages about not using the IB card. I looked around the 
>>>>> internet some and set a couple of environment variables to get a little 
>>>>> more information:
>>>>> 
>>>>> OMPI_MCA_opal_common_ucx_opal_mem_hooks=1
>>>>> export OMPI_MCA_pml_ucx_verbose=100
>>>>> 
>>>>> Here’s what happens:
>>>>> 
>>>>> [novosirj@amarel-test2 ~]$ srun -n 2 --mpi=pmi2 -p oarc  
>>>>> --reservation=UCX ./mpihello-gcc-8-openmpi-4.0.6 
>>>>> srun: job 13993927 queued and waiting for resources
>>>>> srun: job 13993927 has been allocated resources
>>>>> --------------------------------------------------------------------------
>>>>> WARNING: There was an error initializing an OpenFabrics device.
>>>>> 
>>>>> Local host:   gpu004
>>>>> Local device: mlx4_0
>>>>> --------------------------------------------------------------------------
>>>>> --------------------------------------------------------------------------
>>>>> WARNING: There was an error initializing an OpenFabrics device.
>>>>> 
>>>>> Local host:   gpu004
>>>>> Local device: mlx4_0
>>>>> --------------------------------------------------------------------------
>>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:147 using 
>>>>> OPAL memory hooks as external events
>>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>>> ../../../../../openmpi-4.0.6/ompi/mca/pml/ucx/pml_ucx.c:197 
>>>>> mca_pml_ucx_open: UCX version 1.5.2
>>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:147 using 
>>>>> OPAL memory hooks as external events
>>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>>> ../../../../../openmpi-4.0.6/ompi/mca/pml/ucx/pml_ucx.c:197 
>>>>> mca_pml_ucx_open: UCX version 1.5.2
>>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>>> self/self: did not match transport list
>>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>>> tcp/eno1: did not match transport list
>>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>>> self/self: did not match transport list
>>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>>> tcp/ib0: did not match transport list
>>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>>> rc/mlx4_0:1: did not match transport list
>>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>>> ud/mlx4_0:1: did not match transport list
>>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>>> mm/sysv: did not match transport list
>>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>>> mm/posix: did not match transport list
>>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>>> cma/cma: did not match transport list
>>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:311 support 
>>>>> level is none
>>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>>> ../../../../../openmpi-4.0.6/ompi/mca/pml/ucx/pml_ucx.c:268 
>>>>> mca_pml_ucx_close
>>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>>> tcp/eno1: did not match transport list
>>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>>> tcp/ib0: did not match transport list
>>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>>> rc/mlx4_0:1: did not match transport list
>>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>>> ud/mlx4_0:1: did not match transport list
>>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>>> mm/sysv: did not match transport list
>>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>>> mm/posix: did not match transport list
>>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>>> cma/cma: did not match transport list
>>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:311 support 
>>>>> level is none
>>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>>> ../../../../../openmpi-4.0.6/ompi/mca/pml/ucx/pml_ucx.c:268 
>>>>> mca_pml_ucx_close
>>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:147 using 
>>>>> OPAL memory hooks as external events
>>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:147 using 
>>>>> OPAL memory hooks as external events
>>>>> Hello world from processor gpu004.amarel.rutgers.edu, rank 0 out of 2 
>>>>> processors
>>>>> Hello world from processor gpu004.amarel.rutgers.edu, rank 1 out of 2 
>>>>> processors
>>>>> 
>>>>> Here’s the output of a couple more commands that seem to be recommended 
>>>>> when looking into this:
>>>>> 
>>>>> [novosirj@gpu004 ~]$ ucx_info -d
>>>>> #
>>>>> # Memory domain: self
>>>>> #            component: self
>>>>> #             register: unlimited, cost: 0 nsec
>>>>> #           remote key: 8 bytes
>>>>> #
>>>>> #   Transport: self
>>>>> #
>>>>> #   Device: self
>>>>> #
>>>>> #      capabilities:
>>>>> #            bandwidth: 6911.00 MB/sec
>>>>> #              latency: 0 nsec
>>>>> #             overhead: 10 nsec
>>>>> #            put_short: <= 4294967295
>>>>> #            put_bcopy: unlimited
>>>>> #            get_bcopy: unlimited
>>>>> #             am_short: <= 8k
>>>>> #             am_bcopy: <= 8k
>>>>> #               domain: cpu
>>>>> #           atomic_add: 32, 64 bit
>>>>> #           atomic_and: 32, 64 bit
>>>>> #            atomic_or: 32, 64 bit
>>>>> #           atomic_xor: 32, 64 bit
>>>>> #          atomic_fadd: 32, 64 bit
>>>>> #          atomic_fand: 32, 64 bit
>>>>> #           atomic_for: 32, 64 bit
>>>>> #          atomic_fxor: 32, 64 bit
>>>>> #          atomic_swap: 32, 64 bit
>>>>> #         atomic_cswap: 32, 64 bit
>>>>> #           connection: to iface
>>>>> #             priority: 0
>>>>> #       device address: 0 bytes
>>>>> #        iface address: 8 bytes
>>>>> #       error handling: none
>>>>> #
>>>>> #
>>>>> # Memory domain: tcp
>>>>> #            component: tcp
>>>>> #
>>>>> #   Transport: tcp
>>>>> #
>>>>> #   Device: eno1
>>>>> #
>>>>> #      capabilities:
>>>>> #            bandwidth: 113.16 MB/sec
>>>>> #              latency: 5776 nsec
>>>>> #             overhead: 50000 nsec
>>>>> #             am_bcopy: <= 8k
>>>>> #           connection: to iface
>>>>> #             priority: 1
>>>>> #       device address: 4 bytes
>>>>> #        iface address: 2 bytes
>>>>> #       error handling: none
>>>>> #
>>>>> #   Device: ib0
>>>>> #
>>>>> #      capabilities:
>>>>> #            bandwidth: 6239.81 MB/sec
>>>>> #              latency: 5210 nsec
>>>>> #             overhead: 50000 nsec
>>>>> #             am_bcopy: <= 8k
>>>>> #           connection: to iface
>>>>> #             priority: 1
>>>>> #       device address: 4 bytes
>>>>> #        iface address: 2 bytes
>>>>> #       error handling: none
>>>>> #
>>>>> #
>>>>> # Memory domain: ib/mlx4_0
>>>>> #            component: ib
>>>>> #             register: unlimited, cost: 90 nsec
>>>>> #           remote key: 16 bytes
>>>>> #           local memory handle is required for zcopy
>>>>> #
>>>>> #   Transport: rc
>>>>> #
>>>>> #   Device: mlx4_0:1
>>>>> #
>>>>> #      capabilities:
>>>>> #            bandwidth: 6433.22 MB/sec
>>>>> #              latency: 900 nsec + 1 * N
>>>>> #             overhead: 75 nsec
>>>>> #            put_short: <= 88
>>>>> #            put_bcopy: <= 8k
>>>>> #            put_zcopy: <= 1g, up to 6 iov
>>>>> #  put_opt_zcopy_align: <= 512
>>>>> #        put_align_mtu: <= 2k
>>>>> #            get_bcopy: <= 8k
>>>>> #            get_zcopy: 33..1g, up to 6 iov
>>>>> #  get_opt_zcopy_align: <= 512
>>>>> #        get_align_mtu: <= 2k
>>>>> #             am_short: <= 87
>>>>> #             am_bcopy: <= 8191
>>>>> #             am_zcopy: <= 8191, up to 5 iov
>>>>> #   am_opt_zcopy_align: <= 512
>>>>> #         am_align_mtu: <= 2k
>>>>> #            am header: <= 127
>>>>> #               domain: device
>>>>> #           connection: to ep
>>>>> #             priority: 10
>>>>> #       device address: 3 bytes
>>>>> #           ep address: 4 bytes
>>>>> #       error handling: peer failure
>>>>> #
>>>>> #
>>>>> #   Transport: ud
>>>>> #
>>>>> #   Device: mlx4_0:1
>>>>> #
>>>>> #      capabilities:
>>>>> #            bandwidth: 6433.22 MB/sec
>>>>> #              latency: 910 nsec
>>>>> #             overhead: 105 nsec
>>>>> #             am_short: <= 172
>>>>> #             am_bcopy: <= 4088
>>>>> #             am_zcopy: <= 4088, up to 7 iov
>>>>> #   am_opt_zcopy_align: <= 512
>>>>> #         am_align_mtu: <= 4k
>>>>> #            am header: <= 3984
>>>>> #           connection: to ep, to iface
>>>>> #             priority: 10
>>>>> #       device address: 3 bytes
>>>>> #        iface address: 3 bytes
>>>>> #           ep address: 6 bytes
>>>>> #       error handling: peer failure
>>>>> #
>>>>> #
>>>>> # Memory domain: rdmacm
>>>>> #            component: rdmacm
>>>>> #           supports client-server connection establishment via sockaddr
>>>>> #   < no supported devices found >
>>>>> #
>>>>> # Memory domain: sysv
>>>>> #            component: sysv
>>>>> #             allocate: unlimited
>>>>> #           remote key: 32 bytes
>>>>> #
>>>>> #   Transport: mm
>>>>> #
>>>>> #   Device: sysv
>>>>> #
>>>>> #      capabilities:
>>>>> #            bandwidth: 6911.00 MB/sec
>>>>> #              latency: 80 nsec
>>>>> #             overhead: 10 nsec
>>>>> #            put_short: <= 4294967295
>>>>> #            put_bcopy: unlimited
>>>>> #            get_bcopy: unlimited
>>>>> #             am_short: <= 92
>>>>> #             am_bcopy: <= 8k
>>>>> #               domain: cpu
>>>>> #           atomic_add: 32, 64 bit
>>>>> #           atomic_and: 32, 64 bit
>>>>> #            atomic_or: 32, 64 bit
>>>>> #           atomic_xor: 32, 64 bit
>>>>> #          atomic_fadd: 32, 64 bit
>>>>> #          atomic_fand: 32, 64 bit
>>>>> #           atomic_for: 32, 64 bit
>>>>> #          atomic_fxor: 32, 64 bit
>>>>> #          atomic_swap: 32, 64 bit
>>>>> #         atomic_cswap: 32, 64 bit
>>>>> #           connection: to iface
>>>>> #             priority: 0
>>>>> #       device address: 8 bytes
>>>>> #        iface address: 16 bytes
>>>>> #       error handling: none
>>>>> #
>>>>> #
>>>>> # Memory domain: posix
>>>>> #            component: posix
>>>>> #             allocate: unlimited
>>>>> #           remote key: 37 bytes
>>>>> #
>>>>> #   Transport: mm
>>>>> #
>>>>> #   Device: posix
>>>>> #
>>>>> #      capabilities:
>>>>> #            bandwidth: 6911.00 MB/sec
>>>>> #              latency: 80 nsec
>>>>> #             overhead: 10 nsec
>>>>> #            put_short: <= 4294967295
>>>>> #            put_bcopy: unlimited
>>>>> #            get_bcopy: unlimited
>>>>> #             am_short: <= 92
>>>>> #             am_bcopy: <= 8k
>>>>> #               domain: cpu
>>>>> #           atomic_add: 32, 64 bit
>>>>> #           atomic_and: 32, 64 bit
>>>>> #            atomic_or: 32, 64 bit
>>>>> #           atomic_xor: 32, 64 bit
>>>>> #          atomic_fadd: 32, 64 bit
>>>>> #          atomic_fand: 32, 64 bit
>>>>> #           atomic_for: 32, 64 bit
>>>>> #          atomic_fxor: 32, 64 bit
>>>>> #          atomic_swap: 32, 64 bit
>>>>> #         atomic_cswap: 32, 64 bit
>>>>> #           connection: to iface
>>>>> #             priority: 0
>>>>> #       device address: 8 bytes
>>>>> #        iface address: 16 bytes
>>>>> #       error handling: none
>>>>> #
>>>>> #
>>>>> # Memory domain: cma
>>>>> #            component: cma
>>>>> #             register: unlimited, cost: 9 nsec
>>>>> #
>>>>> #   Transport: cma
>>>>> #
>>>>> #   Device: cma
>>>>> #
>>>>> #      capabilities:
>>>>> #            bandwidth: 11145.00 MB/sec
>>>>> #              latency: 80 nsec
>>>>> #             overhead: 400 nsec
>>>>> #            put_zcopy: unlimited, up to 16 iov
>>>>> #  put_opt_zcopy_align: <= 1
>>>>> #        put_align_mtu: <= 1
>>>>> #            get_zcopy: unlimited, up to 16 iov
>>>>> #  get_opt_zcopy_align: <= 1
>>>>> #        get_align_mtu: <= 1
>>>>> #           connection: to iface
>>>>> #             priority: 0
>>>>> #       device address: 8 bytes
>>>>> #        iface address: 4 bytes
>>>>> #       error handling: none
>>>>> #
>>>>> 
>>>>> [novosirj@gpu004 ~]$ ucx_info -p -u t
>>>>> #
>>>>> # UCP context
>>>>> #
>>>>> #            md 0  :  self
>>>>> #            md 1  :  tcp
>>>>> #            md 2  :  ib/mlx4_0
>>>>> #            md 3  :  rdmacm
>>>>> #            md 4  :  sysv
>>>>> #            md 5  :  posix
>>>>> #            md 6  :  cma
>>>>> #
>>>>> #      resource 0  :  md 0  dev 0  flags -- self/self
>>>>> #      resource 1  :  md 1  dev 1  flags -- tcp/eno1
>>>>> #      resource 2  :  md 1  dev 2  flags -- tcp/ib0
>>>>> #      resource 3  :  md 2  dev 3  flags -- rc/mlx4_0:1
>>>>> #      resource 4  :  md 2  dev 3  flags -- ud/mlx4_0:1
>>>>> #      resource 5  :  md 3  dev 4  flags -s rdmacm/sockaddr
>>>>> #      resource 6  :  md 4  dev 5  flags -- mm/sysv
>>>>> #      resource 7  :  md 5  dev 6  flags -- mm/posix
>>>>> #      resource 8  :  md 6  dev 7  flags -- cma/cma
>>>>> #
>>>>> # memory: 0.84MB, file descriptors: 2
>>>>> # create time: 5.032 ms
>>>>> #
>>>>> 
>>>>> Thanks for any help you can offer. What am I missing?
>>>>> 
>>>>> --
>>>>> #BlackLivesMatter
>>>>> ____
>>>>> || \\UTGERS,      
>>>>> |---------------------------*O*---------------------------
>>>>> ||_// the State  |         Ryan Novosielski - novos...@rutgers.edu
>>>>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS 
>>>>> Campus
>>>>> ||  \\    of NJ  | Office of Advanced Research Computing - MSB C630, 
>>>>> Newark
>>>>>  `'
>>>>> 
>>>> 
>>> 
>> 
>> 
> 


Reply via email to