What you're asking for is an ugly path of interconnected dependencies between 
products owned by different companies.  It also completely blows any object 
model we can think of out of the water.  It's all bad in the general case.  The 
best we've come up with for the Libfabric MTL is to disable itself if it only 
finds a blacklisted provider, and then blacklist the VERBS and TCP providers.  
UCX could maybe do this, but it's had when we're talking VERBS.

I think you actually found the right solution for your network.  You know what 
you want to have happen, so set the priorities accordingly.  A default params 
config file bumping up the PSM2 MTL priority sounds like exactly what I would 
suggest as a workaround.

Brian

-----Original Message-----
From: users <users-boun...@lists.open-mpi.org> on behalf of Brice Goglin via 
users <users@lists.open-mpi.org>
Reply-To: Open MPI Users <users@lists.open-mpi.org>
Date: Friday, November 15, 2019 at 1:55 AM
To: "users@lists.open-mpi.org" <users@lists.open-mpi.org>
Cc: Brice Goglin <brice.gog...@inria.fr>, Ludovic Courtès 
<ludovic.cour...@inria.fr>
Subject: [OMPI users] disabling ucx over omnipath

    Hello
    
    We have a platform with an old MLX4 partition and another OPA partition.
    We want a single OMPI installation working for both kinds of nodes. When
    we enable UCX in OMPI for MLX4, UCX ends up being used on the OPA
    partition too, and the performance is poor (3GB/s instead of 10). The
    problem seems to be that UCX gets enabled because they added support for
    OPA in UCX 1.6 even that's just poor OPA support through Verbs.
    
    The only solution we found is to bump the mtl_psm2_priority to 52 so
    that PSM2 gets used before PML UCX. Seems to work fine but I am not sure
    it's a good idea. Could OMPI rather tell UCX to disable itself when it
    only finds OPA?
    
    Thanks
    
    Brice
    
    

Reply via email to