Hi Wolfgang,

Thanks for writing this up. I think that, in my circumstance, I will be able to 
get away with duplicating communicators since the objects should exist for the 
entire program run (i.e,. I might make a few dozen instances at most) - e.g., 
each object can duplicate a communicator and then set up vectors and whatnot 
with it. I should be able to do all of this in my own code and not in deal.II 
so I don't think I'll have to dig up any old patches to the library.

> It only manifested in a crash after a program had been running for
many hours, and small changes to the program made the bug move to unrelated
pieces of the code but not make it appear any earlier in the run time of the
program.

That's frightening and now I'm not sure how PETSc avoids this problem, and I am 
somewhat afraid to even look.

Best,
David
________________________________
From: dealii@googlegroups.com <dealii@googlegroups.com> on behalf of Wolfgang 
Bangerth <bange...@colostate.edu>
Sent: Thursday, February 18, 2021 12:19 PM
To: dealii@googlegroups.com <dealii@googlegroups.com>
Subject: Re: [deal.II] Tags and Communicators


> One fix that I have found (PETSc does this) is to assign every object its
> own duplicated communicator which can then keep track of its own tags with
> MPI's own get and set attributes functions.

Sometime back in the day (in the 2005-2010 range) I spent an inordinate amount
of time going through every class that receives a communicator in some way or
other in the constructor or a reinit() call, and had that class duplicate the
communicator in the way you mention (Utilities::MPI::duplicate_communicator()
does that). You can probably still find those patches if you look long enough.

The reason I did this was primarily because I wanted to have one added layer
of security. If for some reason one process does not participate in a
collective communication, you should get a deadlock, whereas you would either
get funny errors or just completely unreliable results if that process
proceeds to the next communication on the same communicator. Right now, in
practice, all communication happens on MPI_COMM_WORLD.

But, after spending a good amount of time to duplicate all of these
communicators (probably several days of work), I spent *an even larger amount
of time* to track down what was quite likely the worst-to-find bug I have ever
worked on. It only manifested in a crash after a program had been running for
many hours, and small changes to the program made the bug move to unrelated
pieces of the code but not make it appear any earlier in the run time of the
program.

In the end, what it boiled down is that the MPI implementation I was using was
only able to provide 64k different communicators and if you asked for the
64k+1'st communicator, it just crashed. In programs that do thousands of time
steps and allocate a few vectors for temp operations in each time step, you'd
get there in a matter of hours.

I had sort of expected that MPI implementations recycle released
communicators, and I would expect that that's what they do today, but they
didn't back then. So after all of this time spent, I ripped out the
duplication of communicators again. You can probably also find that patch in
the repository and, with some luck, you might even be able to revert it if you
allow some fuzz in indentation etc. It would certainly be interesting to try
this again.

I'm still in favor of this approach. It's conceptually the right thing, it
would help uncover bugs, and it is *necessary* if you want to do things
multithreaded. (All of the MPI implementations have reentrant public
interfaces these days, but we can't use them unless you also duplicate
communicators somewhere.) But I would first want to try with a small test
program whether that is scalable to very large numbers of communicators -- I
think we would quite easily run into millions of communicators with this
approach if programs run long enough, though of course only a rather small
number would be live at any given time.

Best
  W.

--
------------------------------------------------------------------------
Wolfgang Bangerth          email:                 bange...@colostate.edu
                            www: http://www.math.colostate.edu/~bangerth/

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/0785d886-37ad-fde9-9355-9f2a8c56c095%40colostate.edu.

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/BN7PR03MB43566B2237E4CD6CD05F4CE4ED849%40BN7PR03MB4356.namprd03.prod.outlook.com.

Reply via email to