From: users on behalf of Michael Di Domenico
via users
Sent: Wednesday, November 3, 2021 8:58 AM
Cc: Michael Di Domenico; Open MPI Users
Subject: Re: [OMPI users] [EXTERNAL] strange pml error
this seemed to help me as well, so far at least. still have a lot
more testing to do
On Tue, Nov 2
this seemed to help me as well, so far at least. still have a lot
more testing to do
On Tue, Nov 2, 2021 at 4:15 PM Shrader, David Lee wrote:
>
> As a workaround for now, I have found that setting OMPI_MCA_pml=ucx seems to
> get around this issue. I'm not sure why this works, but perhaps there
As a workaround for now, I have found that setting OMPI_MCA_pml=ucx seems to
get around this issue. I'm not sure why this works, but perhaps there is
different initialization that happens such that the offending device search
problem doesn't occur?
Thanks,
David
I too have been getting this using 4.1.1, but not with the master nightly
tarballs from mid-October. I still have it on my to-do list to open a github
issue. The problem seems to come from device detection in the ucx pml: on some
ranks, it fails to find a device and thus the ucx pml disqualifies