I know why it quite - M3EXIT was called - but thanks for looking.

On Wed, May 21, 2014 at 4:02 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:

> Hi Ben
>
> One of the ranks (52) called MPI_Abort.
> This may be a bug in the code, or a problem with the setup
> (e.g. a missing or incorrect input file).
> For instance, the CCTM Wiki says:
> "AERO6 expects emissions inputs for 13 new PM species. CCTM will crash if
> any emitted PM species is not included in the emissions input file"
> I am not familiar to CCTM, so these are just guesses.
>
> It doesn't look like an MPI problem, though.
>
> You may want to check any other logs that the CCTM code may
> produce, for any clue on where it fails.
> Otherwise, you could compile with -g -traceback (and remove any
> optimization options in FFLAGS, FCFLAGS, CFLAGS, etc.)
> It may also have a -DDEBUG or similar that can be turned on
> in the CPPFLAGS, which in many models makes a more verbose log.
> This *may* tell you where it fails (source file, subroutine and line),
> and may help understand why it fails.
> If it dumps a core file, you can trace the failure point with
> a debugger.
>
>
> I hope this helps,
> Gus
>
> On 05/21/2014 03:20 PM, Ben Lash wrote:
>
>> I used a different build of netcdf 4.1.3, and the code seems to run now.
>> I have a totally different, non-mpi related error in part of it, but
>> there's no way for the list to help, I mostly just wanted to report that
>> this particular problem seems to be solved for the record. It doesn't
>> seem to fail quite as gracefully anymore, but I'm still getting enough
>> of the error messages to know what's going on.
>>
>> MPI_ABORT was invoked on rank 52 in communicator MPI_COMM_WORLD
>> with errorcode 0.
>>
>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>> You may or may not see output from other processes, depending on
>> exactly when Open MPI kills them.
>> ------------------------------------------------------------
>> --------------
>> [cn-099.davinci.rice.edu:26185 <http://cn-099.davinci.rice.edu:26185>]
>>
>> [[63355,0],4]-[[63355,1],52] mca_oob_tcp_msg_recv: readv failed:
>> Connection reset by peer (104)
>> [cn-099.davinci.rice.edu:26185 <http://cn-099.davinci.rice.edu:26185>]
>>
>> [[63355,0],4]-[[63355,1],54] mca_oob_tcp_msg_recv: readv failed:
>> Connection reset by peer (104)
>> [cn-099.davinci.rice.edu:26185 <http://cn-099.davinci.rice.edu:26185>]
>>
>> [[63355,0],4]-[[63355,1],55] mca_oob_tcp_msg_recv: readv failed:
>> Connection reset by peer (104)
>> [cn-158.davinci.rice.edu:12459 <http://cn-158.davinci.rice.edu:12459>]
>>
>> [[63355,0],1]-[[63355,1],15] mca_oob_tcp_msg_recv: readv failed:
>> Connection reset by peer (104)
>> [cn-158.davinci.rice.edu:12459 <http://cn-158.davinci.rice.edu:12459>]
>>
>> [[63355,0],1]-[[63355,1],17] mca_oob_tcp_msg_recv: readv failed:
>> Connection reset by peer (104)
>> [cn-099.davinci.rice.edu:26185 <http://cn-099.davinci.rice.edu:26185>]
>>
>> [[63355,0],4]-[[63355,1],56] mca_oob_tcp_msg_recv: readv failed:
>> Connection reset by peer (104)
>> [cn-099.davinci.rice.edu:26185 <http://cn-099.davinci.rice.edu:26185>]
>>
>> [[63355,0],4]-[[63355,1],53] mca_oob_tcp_msg_recv: readv failed:
>> Connection reset by peer (104)
>> [cn-099.davinci.rice.edu:26185 <http://cn-099.davinci.rice.edu:26185>]
>>
>> [[63355,0],4]-[[63355,1],51] mca_oob_tcp_msg_recv: readv failed:
>> Connection reset by peer (104)
>> [cn-099.davinci.rice.edu:26185 <http://cn-099.davinci.rice.edu:26185>]
>>
>> [[63355,0],4]-[[63355,1],57] mca_oob_tcp_msg_recv: readv failed:
>> Connection reset by peer (104)
>> forrtl: error (78): process killed (SIGTERM)
>> Image              PC                Routine            Line        Source
>>
>> ....
>>
>> [cn-158.davinci.rice.edu:12459 <http://cn-158.davinci.rice.edu:12459>]
>>
>> [[63355,0],1]-[[63355,1],16] mca_oob_tcp_msg_recv: readv failed:
>> Connection reset by peer (104)
>> ------------------------------------------------------------
>> --------------
>> mpiexec has exited due to process rank 49 with PID 26187 on
>> node cn-099 exiting improperly. There are two reasons this could occur:
>>
>> 1. this process did not call "init" before exiting, but others in
>> the job did. This can cause a job to hang indefinitely while it waits
>> for all processes to call "init". By rule, if one process calls "init",
>> then ALL processes must call "init" prior to termination.
>>
>> 2. this process called "init", but exited without calling "finalize".
>> By rule, all processes that call "init" MUST call "finalize" prior to
>> exiting or it will be considered an "abnormal termination"
>>
>> This may have caused other processes in the application to be
>> terminated by signals sent by mpiexec (as reported here).
>> ------------------------------------------------------------
>> --------------
>> forrtl: error (78): process killed (SIGTERM)
>> Image              PC                Routine            Line        Source
>> CCTM_V5g_Linux2_x  00000000007FEA29  Unknown               Unknown
>>  Unknown
>> CCTM_V5g_Linux2_x  00000000007FD3A0  Unknown               Unknown
>>  Unknown
>> CCTM_V5g_Linux2_x  00000000007BA9A2  Unknown               Unknown
>>  Unknown
>> CCTM_V5g_Linux2_x  0000000000759288  Unknown               Unknown
>>  Unknown
>>
>> ...
>>
>>
>>
>> On Wed, May 21, 2014 at 2:08 PM, Gus Correa <g...@ldeo.columbia.edu
>> <mailto:g...@ldeo.columbia.edu>> wrote:
>>
>>     Hi Ben
>>
>>     My guess is that your sys admins may have built NetCDF
>>     with parallel support, pnetcdf, and the latter with OpenMPI,
>>     which could explain the dependency.
>>     Ideally, they should have built it again with the latest default
>>     OpenMPI (1.6.5?)
>>
>>     Check if there is a NetCDF module that either doesn't have any
>>     dependence on MPI, or depends on the current Open MPI that
>>     you are using (1.6.5 I think).
>>     A  'module show netcdf/bla/bla'
>>     on the available netcdf modules will tell.
>>
>>     If the application code is old as you said, it probably doesn't use
>>     any pnetcdf. In addition, it should work even with NetCDF 3.X.Y,
>>     which probably doesn't have any pnetcdf built in.
>>     Newer netcdf (4.Z.W > 4.1.3) should also work, and in this case
>>     pick one that requires the default OpenMPI, if available.
>>
>>     Just out of curiosity, besides netcdf/4.1.3, did you load
>> openmpi/1.6.5?
>>     Somehow the openmpi/1.6.5 should have been marked
>>     to conflict with 1.4.4.
>>     Is it?
>>     Anyway, you may want to do a 'which mpiexec' to see which one is
>>     taking precedence in your environment (1.6.5 or 1.4.4)
>>     Probably 1.6.5.
>>
>>     Does the code work now, or does it continue to fail?
>>
>>
>>     I hope this helps,
>>     Gus Correa
>>
>>
>>
>>     On 05/21/2014 02:36 PM, Ben Lash wrote:
>>
>>         Yep, there is is.
>>
>>         [bl10@login2 USlogsminus10]$ module show netcdf/4.1.3
>>         ------------------------------__----------------------------
>> --__-------
>>         /opt/apps/modulefiles/netcdf/__4.1.3:
>>
>>
>>         module           load openmpi/1.4.4-intel
>>         prepend-path     PATH
>>         /opt/apps/netcdf/4.1.3/bin:/__opt/apps/netcdf/4.1.3/deps/__
>> hdf5/1.8.7/bin
>>         prepend-path     LD_LIBRARY_PATH
>>         /opt/apps/netcdf/4.1.3/lib:/__opt/apps/netcdf/4.1.3/deps/__
>> hdf5/1.8.7/lib:/opt/apps/__netcdf/4.1.3/deps/szip/2.1/lib
>>
>>         prepend-path     MANPATH /opt/apps/netcdf/4.1.3/share/__man
>>         ------------------------------__----------------------------
>> --__-------
>>
>>
>>
>>
>>         On Wed, May 21, 2014 at 1:34 PM, Douglas L Reeder
>>         <d...@centurylink.net <mailto:d...@centurylink.net>
>>         <mailto:d...@centurylink.net <mailto:d...@centurylink.net>>>
>> wrote:
>>
>>              Ben,
>>
>>              The netcdf/4.1.3 module maybe loading the openmpi/1.4.4
>>         module. Can
>>              you do module show the netcdf module file to to see if
>>         there is a
>>              module load openmpi command.
>>
>>              Doug Reeder
>>
>>              On May 21, 2014, at 12:23 PM, Ben Lash <b...@rice.edu
>>         <mailto:b...@rice.edu>
>>              <mailto:b...@rice.edu <mailto:b...@rice.edu>>> wrote:
>>
>>                  I just wanted to follow up for anyone else who got a
>>             similar
>>                  problem - module load netcdf/4.1.3 *also* loaded
>>             openmpi/1.4.4. <http://1.4.4.>
>>                  <http://1.4.4./> Don't ask me why. My code doesn't seem
>>             to fail as
>>
>>                  gracefully but otherwise works now. Thanks.
>>
>>
>>                  On Sat, May 17, 2014 at 6:02 AM, Jeff Squyres (jsquyres)
>>                  <jsquy...@cisco.com <mailto:jsquy...@cisco.com>
>>             <mailto:jsquy...@cisco.com <mailto:jsquy...@cisco.com>>>
>> wrote:
>>
>>                      Ditto -- Lmod looks pretty cool.  Thanks for the
>>             heads up.
>>
>>
>>                      On May 16, 2014, at 6:23 PM, Douglas L Reeder
>>                      <d...@centurylink.net <mailto:d...@centurylink.net>
>>             <mailto:d...@centurylink.net <mailto:d...@centurylink.net>>>
>>
>>             wrote:
>>
>>                      > Maxime,
>>                      >
>>                      > I was unaware of Lmod. Thanks for bringing it to
>>             my attention.
>>                      >
>>                      > Doug
>>                      > On May 16, 2014, at 4:07 PM, Maxime Boissonneault
>>                      <maxime.boissonneault@__calculquebec.ca
>>             <mailto:maxime.boissonnea...@calculquebec.ca>
>>                      <mailto:maxime.boissonneault@__calculquebec.ca
>>             <mailto:maxime.boissonnea...@calculquebec.ca>>> wrote:
>>                      >
>>                      >> Instead of using the outdated and not maintained
>>             Module
>>                      environment, why not use Lmod :
>>             https://www.tacc.utexas.edu/__tacc-projects/lmod
>>
>>             <https://www.tacc.utexas.edu/tacc-projects/lmod>
>>                      >>
>>                      >> It is a drop-in replacement for Module
>>             environment that
>>                      supports all of their features and much, much more,
>>             such as :
>>                      >> - module hierarchies
>>                      >> - module properties and color highlighting (we
>>             use it to
>>                      higlight bioinformatic modules or tools for example)
>>                      >> - module caching (very useful for a parallel
>>             filesystem
>>                      with tons of modules)
>>                      >> - path priorities (useful to make sure personal
>>             modules
>>                      take precendence over system modules)
>>                      >> - export module tree to json
>>                      >>
>>                      >> It works like a charm, understand both TCL and
>>             Lua modules
>>                      and is actively developped and debugged. There are
>>             litteraly
>>                      new features every month or so. If it does not do
>>             what you
>>                      want, odds are that the developper will add it
>>             shortly (I've
>>                      had it happen).
>>                      >>
>>                      >> Maxime
>>                      >>
>>                      >> Le 2014-05-16 17:58, Douglas L Reeder a écrit :
>>                      >>> Ben,
>>                      >>>
>>                      >>> You might want to use module (source forge) to
>>             manage
>>                      paths to different mpi implementations. It is
>>             fairly easy to
>>                      set up and very robust for this type of problem.
>>             You would
>>                      remove contentious application paths from you
>>             standard PATH
>>                      and then use module to switch them in and out as
>>             needed.
>>                      >>>
>>                      >>> Doug Reeder
>>                      >>> On May 16, 2014, at 3:39 PM, Ben Lash
>>             <b...@rice.edu <mailto:b...@rice.edu>
>>                      <mailto:b...@rice.edu <mailto:b...@rice.edu>>>
>> wrote:
>>                      >>>
>>                      >>>> My cluster has just upgraded to a new version
>>             of MPI, and
>>                      I'm using an old one. It seems that I'm having
>> trouble
>>                      compiling due to the compiler wrapper file moving
>>             (full error
>>                      here: http://pastebin.com/EmwRvCd9)
>>                      >>>> "Cannot open configuration file
>>
>>             /opt/apps/openmpi/1.4.4-intel/__share/openmpi/mpif90-
>> wrapper-__data.txt"
>>
>>                      >>>>
>>                      >>>> I've found the file on the cluster at
>>
>>               /opt/apps/openmpi/retired/1.4.__4-intel/share/openmpi/
>> mpif90-__wrapper-data.txt
>>
>>                      >>>> How do I tell the old mpi wrapper where this
>>             file is?
>>                      >>>> I've already corrected one link to mpich ->
>>                      /opt/apps/openmpi/retired/1.4.__4-intel/, which is
>>
>>             in the
>>                      software I'm trying to recompile's lib folder
>>                      (/home/bl10/CMAQv5.0.1/lib/__x86_64/ifort). Thanks
>>
>>             for any
>>                      ideas. I also tried changing $pkgdatadir based on
>>             what I read
>>                      here:
>>                      >>>>
>>             http://www.open-mpi.org/faq/?__category=mpi-apps#default-__
>> wrapper-compiler-flags
>>
>>             <http://www.open-mpi.org/faq/?category=mpi-apps#default-
>> wrapper-compiler-flags>
>>                      >>>>
>>                      >>>> Thanks.
>>                      >>>>
>>                      >>>> --Ben L
>>                      >>>> ______________________________
>> ___________________
>>
>>                      >>>> users mailing list
>>                      >>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>             <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>
>>
>>                      >>>>
>>             http://www.open-mpi.org/__mailman/listinfo.cgi/users
>>             <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>                      >>>
>>                      >>>
>>                      >>>
>>                      >>> ______________________________
>> ___________________
>>                      >>> users mailing list
>>                      >>>
>>                      >>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>             <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>
>>
>>                      >>>
>>             http://www.open-mpi.org/__mailman/listinfo.cgi/users
>>             <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>                      >>
>>                      >>
>>                      >> --
>>                      >> ------------------------------__---
>>                      >> Maxime Boissonneault
>>                      >> Analyste de calcul - Calcul Québec, Université
>> Laval
>>                      >> Ph. D. en physique
>>                      >>
>>                      >> _________________________________________________
>>
>>                      >> users mailing list
>>                      >> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>             <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>
>>
>>                      >>
>>             http://www.open-mpi.org/__mailman/listinfo.cgi/users
>>             <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>                      >
>>                      > _________________________________________________
>>                      > users mailing list
>>                      > us...@open-mpi.org <mailto:us...@open-mpi.org>
>>             <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>
>>
>>                      >
>>             http://www.open-mpi.org/__mailman/listinfo.cgi/users
>>             <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>
>>
>>                      --
>>                      Jeff Squyres
>>             jsquy...@cisco.com <mailto:jsquy...@cisco.com>
>>             <mailto:jsquy...@cisco.com <mailto:jsquy...@cisco.com>>
>>
>>
>>                      For corporate legal information go to:
>>             http://www.cisco.com/web/__about/doing_business/legal/__cri/
>>             <http://www.cisco.com/web/about/doing_business/legal/cri/>
>>
>>                      _________________________________________________
>>
>>                      users mailing list
>>             us...@open-mpi.org <mailto:us...@open-mpi.org>
>>             <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>
>>
>>             http://www.open-mpi.org/__mailman/listinfo.cgi/users
>>             <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>
>>
>>
>>
>>                  --
>>                  --Ben L
>>                  _________________________________________________
>>
>>                  users mailing list
>>             us...@open-mpi.org <mailto:us...@open-mpi.org>
>>             <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>
>>             http://www.open-mpi.org/__mailman/listinfo.cgi/users
>>             <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>
>>
>>
>>              _________________________________________________
>>              users mailing list
>>         us...@open-mpi.org <mailto:us...@open-mpi.org>
>>         <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>
>>
>>         http://www.open-mpi.org/__mailman/listinfo.cgi/users
>>         <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>
>>
>>
>>
>>         --
>>         --Ben L
>>
>>
>>         _________________________________________________
>>
>>         users mailing list
>>         us...@open-mpi.org <mailto:us...@open-mpi.org>
>>         http://www.open-mpi.org/__mailman/listinfo.cgi/users
>>         <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>
>>
>>     _________________________________________________
>>
>>     users mailing list
>>     us...@open-mpi.org <mailto:us...@open-mpi.org>
>>     http://www.open-mpi.org/__mailman/listinfo.cgi/users
>>
>>     <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>
>>
>>
>>
>> --
>> --Ben L
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>


-- 
--Ben L

Reply via email to