Hi, in one of our codes, we want to create a log of events that happen in the MPI processes, where the number of these events and their timing is unpredictable.
So I implemented a simple test code, where process 0 creates a thread that is just busy-waiting for messages from any process, and which is sent to stdout/stderr/log file upon receiving them. The test code is at https://github.com/angel-devicente/thread_io and the same idea went into our "real" code. As far as I could see, this behaves very nicely, there are no deadlocks, no lost messages and the performance penalty is minimal when considering the real application this is intended for. But then I found that in a local cluster the performance was very bad (from ~5min 50s to ~5s for some test) when run with the locally installed OpenMPI and my own OpenMPI installation (same gcc and OpenMPI versions). Checking the OpenMPI configuration details, I found that the locally installed OpenMPI was configured to use the Mellanox IB driver, and in particular the hcoll component was somehow killing performance: running with mpirun --mca coll_hcoll_enable 0 -np 51 ./test_t was taking ~5s, while enabling coll_hcoll was killing performance, as stated above (when run in a single node the performance also goes down, but only about a factor 2X). Has anyone seen anything like this? Perhaps a newer Mellanox driver would solve the problem? We were planning on making our code public, but before we do so, I want to understand under which conditions we could have this problem with the "Threaded I/O" approach and if possible how to get rid of it completely. Any help/pointers appreciated. -- Ángel de Vicente Tel.: +34 922 605 747 Web.: http://research.iac.es/proyecto/polmag/ --------------------------------------------------------------------------------------------- ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de Datos, acceda a http://www.iac.es/disclaimer.php WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en
