Nathan, Sadly, I'm not sure I can provide a reproducer, as it's currently our full earth system model and is accessing terabytes of background files, etc. That said, I'll work on it. I have a tiny version of the model, but that usually always works everywhere (and I can only reproduce the issue at a rather high resolution).
We do have a couple of code testers that duplicate functionality around that MPI_Wait call, but, and this is the fun part, it seems to be a very specific type of that call (only if you are doing a daily time-averaged collection!). Still, I'll try and test that tester with Open MPI 2.1.0. Maybe it'll hang! As for kernel, my desktop is 3.10.0-514.16.1.el7.x86_64 (RHEL 7) and the cluster compute node is on 3.0.101-0.47.90-default (SLES11 SP3). If I run 'lsmod' I see xpmem on the cluster, but my desktop does not have it. So, perhaps not XPMEM related? Matt On Mon, Jun 5, 2017 at 1:00 PM, Nathan Hjelm <hje...@me.com> wrote: > Can you provide a reproducer for the hang? What kernel version are you > using? Is xpmem installed? > > -Nathan > > On Jun 05, 2017, at 10:53 AM, Matt Thompson <fort...@gmail.com> wrote: > > OMPI Users, > > I was wondering if there is a best way to "tune" vader to get around an > intermittent MPI_Wait halt? > > I ask because I recently found that if I use Open MPI 2.1.x on either my > desktop or on the supercomputer I have access to, if vader is enabled, the > model seems to "deadlock" at an MPI_Wait call. If I run as: > > mpirun --mca btl self,sm,tcp > > on my desktop it works. When I moved to my cluster, I tried the more > generic: > > mpirun --mca btl ^vader > > since it uses openib, and with it things work. Well, I hope that's how one > would turn off vader in MCA speak. (Note: this deadlock seems a bit > sporadic, but I do now have a case which seems to cause it reproducibly). > > Now, I know vader is supposed to be the "better" sm communication tech, so > I'd rather use it and thought maybe I could twiddle some tuning knobs. So I > looked at: > > https://www.open-mpi.org/faq/?category=sm > > and there I saw question 6 "How do I know what MCA parameters are > available for tuning MPI performance?". But when I try the commands listed > (minus the HTML/CSS tags): > > (1081) $ ompi_info --param btl sm > MCA btl: sm (MCA v2.1.0, API v3.0.0, Component v2.1.0) > (1082) $ ompi_info --param mpool sm > (1083) $ > > Huh. I expected more, but searching around the Open MPI FAQs made me think > I should use: > > ompi_info --param btl sm --level 9 > > which does spit out a lot, though the equivalent for mpool sm does not. > > Any ideas on which of the many knobs is best to try and turn? Something > that, by default, perhaps is one thing for sm but different for vader? I > tried to see if "ompi_info --param btl vader --level 9" did something, but > it doesn't put anything out. > > I will note that this code runs just fine with Open MPI 2.0.2 as well as > with Intel MPI and SGI MPT, so I'm thinking the code itself is okay, but > something from Open MPI 2.0.x to Open MPI 2.1.x changed. I see two entries > in the Open MPI 2.1.0 announcement about vader, but nothing specific about > how to "revert" if they are even causing the problem: > > - Fix regression that lowered the memory maximum message bandwidth for > large messages on some BTL network transports, such as openib, sm, > and vader. > > > - The vader BTL is now more efficient in terms of memory usage when > using XPMEM. > > > Thanks for any help, > Matt > > > -- > Matt Thompson > > Man Among Men > Fulcrum of History > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > -- Matt Thompson Man Among Men Fulcrum of History
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users