We cannot reproduce this. On four nodes 20 PPN with and w/o hcoll it takes
exactly the same 19 secs (80 ranks).

What version of HCOLL are you using? Command line?

Josh

On Tue, Feb 4, 2020 at 8:44 AM George Bosilca via users <
users@lists.open-mpi.org> wrote:

> Hcoll will be present in many cases, you don’t really want to skip them
> all. I foresee 2 problem with the approach you propose:
> - collective components are selected per communicator, so even if they
> will not be used they are still loaded.
> - from outside the MPI library you have little access to internal
> information, especially to components that are loaded and actives.
>
> I’m afraid the best solution is to prevent OMPI from loading the hcoll
> component if you want to use threading, by adding ‘—mca coll ^hcoll’ to
> your mpirun.
>
>   George.
>
>
> On Tue, Feb 4, 2020 at 8:32 AM Angel de Vicente <angel.de.vice...@iac.es>
> wrote:
>
>> Hi,
>>
>> George Bosilca <bosi...@icl.utk.edu> writes:
>>
>> > If I'm not mistaken, hcoll is playing with the opal_progress in a way
>> > that conflicts with the blessed usage of progress in OMPI and prevents
>> > other components from advancing and timely completing requests. The
>> > impact is minimal for sequential applications using only blocking
>> > calls, but is jeopardizing performance when multiple types of
>> > communications are simultaneously executing or when multiple threads
>> > are active.
>> >
>> > The solution might be very simple: hcoll is a module providing support
>> > for collective communications so as long as you don't use collectives,
>> > or the tuned module provides collective performance similar to hcoll
>> > on your cluster, just go ahead and disable hcoll. You can also reach
>> > out to Mellanox folks asking them to fix the hcoll usage of
>> > opal_progress.
>>
>> until we find a more robust solution I was thinking on trying to just
>> enquiry the MPI implementation at running time and use the threaded
>> version if hcoll is not present and go for the unthreaded version if it
>> is. Looking at the coll.h file I see that some functions there might be
>> useful, for example: mca_coll_base_component_comm_query_2_0_0_fn_t, but
>> I have never delved here. Would this be an appropriate approach? Any
>> examples on how to enquiry in code for a particular component?
>>
>> Thanks,
>> --
>> Ángel de Vicente
>>
>> Tel.: +34 922 605 747
>> Web.: http://research.iac.es/proyecto/polmag/
>>
>> ---------------------------------------------------------------------------------------------
>> ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección
>> de Datos, acceda a http://www.iac.es/disclaimer.php
>> WARNING: For more information on privacy and fulfilment of the Law
>> concerning the Protection of Data, consult
>> http://www.iac.es/disclaimer.php?lang=en
>>
>>

Reply via email to