We cannot reproduce this. On four nodes 20 PPN with and w/o hcoll it takes exactly the same 19 secs (80 ranks).
What version of HCOLL are you using? Command line? Josh On Tue, Feb 4, 2020 at 8:44 AM George Bosilca via users < users@lists.open-mpi.org> wrote: > Hcoll will be present in many cases, you don’t really want to skip them > all. I foresee 2 problem with the approach you propose: > - collective components are selected per communicator, so even if they > will not be used they are still loaded. > - from outside the MPI library you have little access to internal > information, especially to components that are loaded and actives. > > I’m afraid the best solution is to prevent OMPI from loading the hcoll > component if you want to use threading, by adding ‘—mca coll ^hcoll’ to > your mpirun. > > George. > > > On Tue, Feb 4, 2020 at 8:32 AM Angel de Vicente <angel.de.vice...@iac.es> > wrote: > >> Hi, >> >> George Bosilca <bosi...@icl.utk.edu> writes: >> >> > If I'm not mistaken, hcoll is playing with the opal_progress in a way >> > that conflicts with the blessed usage of progress in OMPI and prevents >> > other components from advancing and timely completing requests. The >> > impact is minimal for sequential applications using only blocking >> > calls, but is jeopardizing performance when multiple types of >> > communications are simultaneously executing or when multiple threads >> > are active. >> > >> > The solution might be very simple: hcoll is a module providing support >> > for collective communications so as long as you don't use collectives, >> > or the tuned module provides collective performance similar to hcoll >> > on your cluster, just go ahead and disable hcoll. You can also reach >> > out to Mellanox folks asking them to fix the hcoll usage of >> > opal_progress. >> >> until we find a more robust solution I was thinking on trying to just >> enquiry the MPI implementation at running time and use the threaded >> version if hcoll is not present and go for the unthreaded version if it >> is. Looking at the coll.h file I see that some functions there might be >> useful, for example: mca_coll_base_component_comm_query_2_0_0_fn_t, but >> I have never delved here. Would this be an appropriate approach? Any >> examples on how to enquiry in code for a particular component? >> >> Thanks, >> -- >> Ángel de Vicente >> >> Tel.: +34 922 605 747 >> Web.: http://research.iac.es/proyecto/polmag/ >> >> --------------------------------------------------------------------------------------------- >> ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección >> de Datos, acceda a http://www.iac.es/disclaimer.php >> WARNING: For more information on privacy and fulfilment of the Law >> concerning the Protection of Data, consult >> http://www.iac.es/disclaimer.php?lang=en >> >>