Hi, George Bosilca <bosi...@icl.utk.edu> writes:
> If I'm not mistaken, hcoll is playing with the opal_progress in a way > that conflicts with the blessed usage of progress in OMPI and prevents > other components from advancing and timely completing requests. The > impact is minimal for sequential applications using only blocking > calls, but is jeopardizing performance when multiple types of > communications are simultaneously executing or when multiple threads > are active. > > The solution might be very simple: hcoll is a module providing support > for collective communications so as long as you don't use collectives, > or the tuned module provides collective performance similar to hcoll > on your cluster, just go ahead and disable hcoll. You can also reach > out to Mellanox folks asking them to fix the hcoll usage of > opal_progress. until we find a more robust solution I was thinking on trying to just enquiry the MPI implementation at running time and use the threaded version if hcoll is not present and go for the unthreaded version if it is. Looking at the coll.h file I see that some functions there might be useful, for example: mca_coll_base_component_comm_query_2_0_0_fn_t, but I have never delved here. Would this be an appropriate approach? Any examples on how to enquiry in code for a particular component? Thanks, -- Ángel de Vicente Tel.: +34 922 605 747 Web.: http://research.iac.es/proyecto/polmag/ --------------------------------------------------------------------------------------------- ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de Datos, acceda a http://www.iac.es/disclaimer.php WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en