I could roughly understand what the coll_ml is and how you are going to treat it, thanks.
As Ralph pointed out, I didn't see coll_ml was really used. I just thought the slowdown meant it was used. I'll check it later. It might be due to the expensive connectivity computation. Tetsuya > One of the authors of ML mentioned to me off-list that he has an idea what might have been causing the slowdown. They're actively working on tweaking and making things better. > > I told them to ping you -- the whole point is that ml is supposed to be *better* than our existing collectives, so if it's not, we should fix that before we make ml be the default. :-) > > > On Mar 21, 2014, at 9:04 AM, Ralph Castain <r...@open-mpi.org> wrote: > > > > > On Mar 20, 2014, at 5:56 PM, tmish...@jcity.maeda.co.jp wrote: > > > >> > >> Hi Ralph, congratulations on releasing new openmpi-1.7.5. > >> > >> By the way, opnempi-1.7.5rc3 has been slowing down our application > >> with smaller size of testing data, where the time consuming part > >> of our application is so called sparse solver. It's negligible > >> with medium or large size data - more practical one, so I have > >> been defering this problem. > >> > >> However, this slowdown disappears in the final version of > >> openmpi-1.7.5. After some investigations, I found coll_ml caused > >> this slowdown. The final version seems to set coll_ml_priority as zero > >> again. > >> > >> Could you explain briefly about the advantage of coll_ml? In what kind > >> of situation it's effective and so on ... > > > > I'm not really the one to speak about coll/ml as I wasn't involved in it - Nathan would be the one to ask. It is supposed to be significantly faster for most collectives, but I imagine it would > depend on the precise collective being used and the size of the data. We did find and fix a number of problems right at the end (which is why we dropped the priority until we can better test/debug > it), and so we might have hit something that was causing your slow down. > > > > > >> > >> In addition, I'm not sure why coll_my is activated in openmpi-1.7.5rc3, > >> although its priority is lower than tuned as described in the message > >> of changeset 30790: > >> We are initially setting the priority lower than > >> tuned until this has had some time to soak in the trunk. > > > > Were you actually seeing coll/ml being used? It shouldn't have been. However, coll/ml was getting called during the collective initialization phase so it could set itself up, even if it wasn't > being used. One part of its setup is a somewhat expensive connectivity computation - one of our last-minute cleanups was removal of a static 1MB array in that procedure. Changing the priority to 0 > completely disables the coll/ml component, thus removing it from even the initialization phase. My guess is that you were seeing a measurable "hit" by that procedure on your small data tests, which > probably ran fairly quickly - and not seeing it on the other tests because the setup time was swamped by the computation time. > > > > > >> > >> Tetsuya > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users