Yes, Nathan has a few coll ml fixes queued up for 1.8.
On Mar 24, 2014, at 10:11 PM, tmish...@jcity.maeda.co.jp wrote:
>
>
> I ran our application using the final version of openmpi-1.7.5 again
> with coll_ml_priority = 90.
>
> Then, coll/ml was actually activated and I got these error message
I ran our application using the final version of openmpi-1.7.5 again
with coll_ml_priority = 90.
Then, coll/ml was actually activated and I got these error messages
as shown below:
[manage][[11217,1],0][coll_ml_lmngr.c:265:mca_coll_ml_lmngr_alloc] COLL-ML
List manager is empty.
[manage][[11217,1
I could roughly understand what the coll_ml is and how you
are going to treat it, thanks.
As Ralph pointed out, I didn't see coll_ml was really used.
I just thought the slowdown meant it was used. I'll check it
later. It might be due to the expensive connectivity computation.
Tetsuya
> One of
One of the authors of ML mentioned to me off-list that he has an idea what
might have been causing the slowdown. They're actively working on tweaking and
making things better.
I told them to ping you -- the whole point is that ml is supposed to be
*better* than our existing collectives, so if
On Mar 20, 2014, at 5:56 PM, tmish...@jcity.maeda.co.jp wrote:
>
> Hi Ralph, congratulations on releasing new openmpi-1.7.5.
>
> By the way, opnempi-1.7.5rc3 has been slowing down our application
> with smaller size of testing data, where the time consuming part
> of our application is so calle