Re: [OMPI users] [EXTERNAL] strange pml error

2021-11-03 Thread Shrader, David Lee via users
From: users on behalf of Michael Di Domenico via users Sent: Wednesday, November 3, 2021 8:58 AM Cc: Michael Di Domenico; Open MPI Users Subject: Re: [OMPI users] [EXTERNAL] strange pml error this seemed to help me as well, so far at least. still have a lot more testing to do On Tue, Nov 2

Re: [OMPI users] [EXTERNAL] strange pml error

2021-11-03 Thread Michael Di Domenico via users
this seemed to help me as well, so far at least. still have a lot more testing to do On Tue, Nov 2, 2021 at 4:15 PM Shrader, David Lee wrote: > > As a workaround for now, I have found that setting OMPI_MCA_pml=ucx seems to > get around this issue. I'm not sure why this works, but perhaps there

Re: [OMPI users] [EXTERNAL] strange pml error

2021-11-02 Thread Shrader, David Lee via users
As a workaround for now, I have found that setting OMPI_MCA_pml=ucx seems to get around this issue. I'm not sure why this works, but perhaps there is different initialization that happens such that the offending device search problem doesn't occur? Thanks, David

Re: [OMPI users] [EXTERNAL] strange pml error

2021-11-02 Thread Shrader, David Lee via users
I too have been getting this using 4.1.1, but not with the master nightly tarballs from mid-October. I still have it on my to-do list to open a github issue. The problem seems to come from device detection in the ucx pml: on some ranks, it fails to find a device and thus the ucx pml disqualifies