The app is not calling MPI_ABORT directly. I dug a little deeper into it
but didn't find anything interesting. It just doesn't find the subdirectory
for output purposes (the internal error variable is 0) and simply crashes
when returning from the subroutine. It was just me not setting things up
properly but everything seems to be working fine now.
Jeff Squyres (jsquyres) wrote:
Is your app calling MPI_Abort directly? There's a 2nd argument to MPI_ABORT
that should be passed to the output message. If it's not, we should
investigate that.
Or is your app aborting in some other, indirect method? If so, perhaps
somehow that 2nd argument is getting dropped somewhere along the way, and
the number you're seeing in the message is effectively an uninitialized
integer. That's probably not *too* alarming in this case (because you're
aborting, after all). But it would probably be good to understand that code
path and fix it up if there's something wrong.
On Jan 30, 2021, at 11:30 AM, Arturo Fernandez <afernan...@odyhpc.com
<mailto:afernan...@odyhpc.com> > wrote:
Hi Jeff. Sorry for the delay. It took a while but I was finally error to
track down the point where the app breaks down. The problem seems to
originate in an output subroutine, not because any MPI communication is
malfunctioning. My guess is that MPI_Abort needs to produce some error
message. Why the high number? Not sure. Thanks.
Arturo
--
Jeff Squyres
jsquy...@cisco.com <mailto:jsquy...@cisco.com>