Re: [OMPI users] Tuning vader for MPI_Wait Halt?

Matt Thompson Wed, 07 Jun 2017 09:11:26 -0700

Nathan,

Sadly, I'm not sure I can provide a reproducer, as it's currently our full
earth system model and is accessing terabytes of background files, etc.
That said, I'll work on it. I have a tiny version of the model, but that
usually always works everywhere (and I can only reproduce the issue at a
rather high resolution).


We do have a couple of code testers that duplicate functionality around
that MPI_Wait call, but, and this is the fun part, it seems to be a very
specific type of that call (only if you are doing a daily time-averaged
collection!). Still, I'll try and test that tester with Open MPI 2.1.0.
Maybe it'll hang!

As for kernel, my desktop is 3.10.0-514.16.1.el7.x86_64 (RHEL 7)  and the
cluster compute node is on 3.0.101-0.47.90-default (SLES11 SP3). If I run
'lsmod' I see xpmem on the cluster, but my desktop does not have it. So,
perhaps not XPMEM related?

Matt

On Mon, Jun 5, 2017 at 1:00 PM, Nathan Hjelm <hje...@me.com> wrote:

> Can you provide a reproducer for the hang? What kernel version are you
> using? Is xpmem installed?
>
> -Nathan
>
> On Jun 05, 2017, at 10:53 AM, Matt Thompson <fort...@gmail.com> wrote:
>
> OMPI Users,
>
> I was wondering if there is a best way to "tune" vader to get around an
> intermittent MPI_Wait halt?
>
> I ask because I recently found that if I use Open MPI 2.1.x on either my
> desktop or on the supercomputer I have access to, if vader is enabled, the
> model seems to "deadlock" at an MPI_Wait call. If I run as:
>
>   mpirun --mca btl self,sm,tcp
>
> on my desktop it works. When I moved to my cluster, I tried the more
> generic:
>
>   mpirun --mca btl ^vader
>
> since it uses openib, and with it things work. Well, I hope that's how one
> would turn off vader in MCA speak. (Note: this deadlock seems a bit
> sporadic, but I do now have a case which seems to cause it reproducibly).
>
> Now, I know vader is supposed to be the "better" sm communication tech, so
> I'd rather use it and thought maybe I could twiddle some tuning knobs. So I
> looked at:
>
>   https://www.open-mpi.org/faq/?category=sm
>
> and there I saw question 6 "How do I know what MCA parameters are
> available for tuning MPI performance?". But when I try the commands listed
> (minus the HTML/CSS tags):
>
> (1081) $ ompi_info --param btl sm
>                  MCA btl: sm (MCA v2.1.0, API v3.0.0, Component v2.1.0)
> (1082) $ ompi_info --param mpool sm
> (1083) $
>
> Huh. I expected more, but searching around the Open MPI FAQs made me think
> I should use:
>
>   ompi_info --param btl sm --level 9
>
> which does spit out a lot, though the equivalent for mpool sm does not.
>
> Any ideas on which of the many knobs is best to try and turn? Something
> that, by default, perhaps is one thing for sm but different for vader? I
> tried to see if "ompi_info --param btl vader --level 9" did something, but
> it doesn't put anything out.
>
> I will note that this code runs just fine with Open MPI 2.0.2 as well as
> with Intel MPI and SGI MPT, so I'm thinking the code itself is okay, but
> something from Open MPI 2.0.x to Open MPI 2.1.x changed. I see two entries
> in the Open MPI 2.1.0 announcement about vader, but nothing specific about
> how to "revert" if they are even causing the problem:
>
> - Fix regression that lowered the memory maximum message bandwidth for
>   large messages on some BTL network transports, such as openib, sm,
>   and vader.
>
>
> - The vader BTL is now more efficient in terms of memory usage when
>   using XPMEM.
>
>
> Thanks for any help,
> Matt
>
>
> --
> Matt Thompson
>
> Man Among Men
> Fulcrum of History
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>



-- 
Matt Thompson

Man Among Men
Fulcrum of History

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Tuning vader for MPI_Wait Halt?

Reply via email to