Hi Ben

One of the ranks (52) called MPI_Abort.
This may be a bug in the code, or a problem with the setup
(e.g. a missing or incorrect input file).
For instance, the CCTM Wiki says:
"AERO6 expects emissions inputs for 13 new PM species. CCTM will crash if any emitted PM species is not included in the emissions input file"
I am not familiar to CCTM, so these are just guesses.

It doesn't look like an MPI problem, though.

You may want to check any other logs that the CCTM code may
produce, for any clue on where it fails.
Otherwise, you could compile with -g -traceback (and remove any
optimization options in FFLAGS, FCFLAGS, CFLAGS, etc.)
It may also have a -DDEBUG or similar that can be turned on
in the CPPFLAGS, which in many models makes a more verbose log.
This *may* tell you where it fails (source file, subroutine and line),
and may help understand why it fails.
If it dumps a core file, you can trace the failure point with
a debugger.

I hope this helps,
Gus

On 05/21/2014 03:20 PM, Ben Lash wrote:
I used a different build of netcdf 4.1.3, and the code seems to run now.
I have a totally different, non-mpi related error in part of it, but
there's no way for the list to help, I mostly just wanted to report that
this particular problem seems to be solved for the record. It doesn't
seem to fail quite as gracefully anymore, but I'm still getting enough
of the error messages to know what's going on.

MPI_ABORT was invoked on rank 52 in communicator MPI_COMM_WORLD
with errorcode 0.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[cn-099.davinci.rice.edu:26185 <http://cn-099.davinci.rice.edu:26185>]
[[63355,0],4]-[[63355,1],52] mca_oob_tcp_msg_recv: readv failed:
Connection reset by peer (104)
[cn-099.davinci.rice.edu:26185 <http://cn-099.davinci.rice.edu:26185>]
[[63355,0],4]-[[63355,1],54] mca_oob_tcp_msg_recv: readv failed:
Connection reset by peer (104)
[cn-099.davinci.rice.edu:26185 <http://cn-099.davinci.rice.edu:26185>]
[[63355,0],4]-[[63355,1],55] mca_oob_tcp_msg_recv: readv failed:
Connection reset by peer (104)
[cn-158.davinci.rice.edu:12459 <http://cn-158.davinci.rice.edu:12459>]
[[63355,0],1]-[[63355,1],15] mca_oob_tcp_msg_recv: readv failed:
Connection reset by peer (104)
[cn-158.davinci.rice.edu:12459 <http://cn-158.davinci.rice.edu:12459>]
[[63355,0],1]-[[63355,1],17] mca_oob_tcp_msg_recv: readv failed:
Connection reset by peer (104)
[cn-099.davinci.rice.edu:26185 <http://cn-099.davinci.rice.edu:26185>]
[[63355,0],4]-[[63355,1],56] mca_oob_tcp_msg_recv: readv failed:
Connection reset by peer (104)
[cn-099.davinci.rice.edu:26185 <http://cn-099.davinci.rice.edu:26185>]
[[63355,0],4]-[[63355,1],53] mca_oob_tcp_msg_recv: readv failed:
Connection reset by peer (104)
[cn-099.davinci.rice.edu:26185 <http://cn-099.davinci.rice.edu:26185>]
[[63355,0],4]-[[63355,1],51] mca_oob_tcp_msg_recv: readv failed:
Connection reset by peer (104)
[cn-099.davinci.rice.edu:26185 <http://cn-099.davinci.rice.edu:26185>]
[[63355,0],4]-[[63355,1],57] mca_oob_tcp_msg_recv: readv failed:
Connection reset by peer (104)
forrtl: error (78): process killed (SIGTERM)
Image              PC                Routine            Line        Source

....

[cn-158.davinci.rice.edu:12459 <http://cn-158.davinci.rice.edu:12459>]
[[63355,0],1]-[[63355,1],16] mca_oob_tcp_msg_recv: readv failed:
Connection reset by peer (104)
--------------------------------------------------------------------------
mpiexec has exited due to process rank 49 with PID 26187 on
node cn-099 exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpiexec (as reported here).
--------------------------------------------------------------------------
forrtl: error (78): process killed (SIGTERM)
Image              PC                Routine            Line        Source
CCTM_V5g_Linux2_x  00000000007FEA29  Unknown               Unknown  Unknown
CCTM_V5g_Linux2_x  00000000007FD3A0  Unknown               Unknown  Unknown
CCTM_V5g_Linux2_x  00000000007BA9A2  Unknown               Unknown  Unknown
CCTM_V5g_Linux2_x  0000000000759288  Unknown               Unknown  Unknown

...



On Wed, May 21, 2014 at 2:08 PM, Gus Correa <g...@ldeo.columbia.edu
<mailto:g...@ldeo.columbia.edu>> wrote:

    Hi Ben

    My guess is that your sys admins may have built NetCDF
    with parallel support, pnetcdf, and the latter with OpenMPI,
    which could explain the dependency.
    Ideally, they should have built it again with the latest default
    OpenMPI (1.6.5?)

    Check if there is a NetCDF module that either doesn't have any
    dependence on MPI, or depends on the current Open MPI that
    you are using (1.6.5 I think).
    A  'module show netcdf/bla/bla'
    on the available netcdf modules will tell.

    If the application code is old as you said, it probably doesn't use
    any pnetcdf. In addition, it should work even with NetCDF 3.X.Y,
    which probably doesn't have any pnetcdf built in.
    Newer netcdf (4.Z.W > 4.1.3) should also work, and in this case
    pick one that requires the default OpenMPI, if available.

    Just out of curiosity, besides netcdf/4.1.3, did you load openmpi/1.6.5?
    Somehow the openmpi/1.6.5 should have been marked
    to conflict with 1.4.4.
    Is it?
    Anyway, you may want to do a 'which mpiexec' to see which one is
    taking precedence in your environment (1.6.5 or 1.4.4)
    Probably 1.6.5.

    Does the code work now, or does it continue to fail?


    I hope this helps,
    Gus Correa



    On 05/21/2014 02:36 PM, Ben Lash wrote:

        Yep, there is is.

        [bl10@login2 USlogsminus10]$ module show netcdf/4.1.3
        ------------------------------__------------------------------__-------
        /opt/apps/modulefiles/netcdf/__4.1.3:

        module           load openmpi/1.4.4-intel
        prepend-path     PATH
        
/opt/apps/netcdf/4.1.3/bin:/__opt/apps/netcdf/4.1.3/deps/__hdf5/1.8.7/bin
        prepend-path     LD_LIBRARY_PATH
        
/opt/apps/netcdf/4.1.3/lib:/__opt/apps/netcdf/4.1.3/deps/__hdf5/1.8.7/lib:/opt/apps/__netcdf/4.1.3/deps/szip/2.1/lib

        prepend-path     MANPATH /opt/apps/netcdf/4.1.3/share/__man
        ------------------------------__------------------------------__-------



        On Wed, May 21, 2014 at 1:34 PM, Douglas L Reeder
        <d...@centurylink.net <mailto:d...@centurylink.net>
        <mailto:d...@centurylink.net <mailto:d...@centurylink.net>>> wrote:

             Ben,

             The netcdf/4.1.3 module maybe loading the openmpi/1.4.4
        module. Can
             you do module show the netcdf module file to to see if
        there is a
             module load openmpi command.

             Doug Reeder

             On May 21, 2014, at 12:23 PM, Ben Lash <b...@rice.edu
        <mailto:b...@rice.edu>
             <mailto:b...@rice.edu <mailto:b...@rice.edu>>> wrote:

                 I just wanted to follow up for anyone else who got a
            similar
                 problem - module load netcdf/4.1.3 *also* loaded
            openmpi/1.4.4. <http://1.4.4.>
                 <http://1.4.4./> Don't ask me why. My code doesn't seem
            to fail as

                 gracefully but otherwise works now. Thanks.


                 On Sat, May 17, 2014 at 6:02 AM, Jeff Squyres (jsquyres)
                 <jsquy...@cisco.com <mailto:jsquy...@cisco.com>
            <mailto:jsquy...@cisco.com <mailto:jsquy...@cisco.com>>> wrote:

                     Ditto -- Lmod looks pretty cool.  Thanks for the
            heads up.


                     On May 16, 2014, at 6:23 PM, Douglas L Reeder
                     <d...@centurylink.net <mailto:d...@centurylink.net>
            <mailto:d...@centurylink.net <mailto:d...@centurylink.net>>>
            wrote:

                     > Maxime,
                     >
                     > I was unaware of Lmod. Thanks for bringing it to
            my attention.
                     >
                     > Doug
                     > On May 16, 2014, at 4:07 PM, Maxime Boissonneault
                     <maxime.boissonneault@__calculquebec.ca
            <mailto:maxime.boissonnea...@calculquebec.ca>
                     <mailto:maxime.boissonneault@__calculquebec.ca
            <mailto:maxime.boissonnea...@calculquebec.ca>>> wrote:
                     >
                     >> Instead of using the outdated and not maintained
            Module
                     environment, why not use Lmod :
            https://www.tacc.utexas.edu/__tacc-projects/lmod
            <https://www.tacc.utexas.edu/tacc-projects/lmod>
                     >>
                     >> It is a drop-in replacement for Module
            environment that
                     supports all of their features and much, much more,
            such as :
                     >> - module hierarchies
                     >> - module properties and color highlighting (we
            use it to
                     higlight bioinformatic modules or tools for example)
                     >> - module caching (very useful for a parallel
            filesystem
                     with tons of modules)
                     >> - path priorities (useful to make sure personal
            modules
                     take precendence over system modules)
                     >> - export module tree to json
                     >>
                     >> It works like a charm, understand both TCL and
            Lua modules
                     and is actively developped and debugged. There are
            litteraly
                     new features every month or so. If it does not do
            what you
                     want, odds are that the developper will add it
            shortly (I've
                     had it happen).
                     >>
                     >> Maxime
                     >>
                     >> Le 2014-05-16 17:58, Douglas L Reeder a écrit :
                     >>> Ben,
                     >>>
                     >>> You might want to use module (source forge) to
            manage
                     paths to different mpi implementations. It is
            fairly easy to
                     set up and very robust for this type of problem.
            You would
                     remove contentious application paths from you
            standard PATH
                     and then use module to switch them in and out as
            needed.
                     >>>
                     >>> Doug Reeder
                     >>> On May 16, 2014, at 3:39 PM, Ben Lash
            <b...@rice.edu <mailto:b...@rice.edu>
                     <mailto:b...@rice.edu <mailto:b...@rice.edu>>> wrote:
                     >>>
                     >>>> My cluster has just upgraded to a new version
            of MPI, and
                     I'm using an old one. It seems that I'm having trouble
                     compiling due to the compiler wrapper file moving
            (full error
                     here: http://pastebin.com/EmwRvCd9)
                     >>>> "Cannot open configuration file

            
/opt/apps/openmpi/1.4.4-intel/__share/openmpi/mpif90-wrapper-__data.txt"
                     >>>>
                     >>>> I've found the file on the cluster at

              
/opt/apps/openmpi/retired/1.4.__4-intel/share/openmpi/mpif90-__wrapper-data.txt
                     >>>> How do I tell the old mpi wrapper where this
            file is?
                     >>>> I've already corrected one link to mpich ->
                     /opt/apps/openmpi/retired/1.4.__4-intel/, which is
            in the
                     software I'm trying to recompile's lib folder
                     (/home/bl10/CMAQv5.0.1/lib/__x86_64/ifort). Thanks
            for any
                     ideas. I also tried changing $pkgdatadir based on
            what I read
                     here:
                     >>>>
            
http://www.open-mpi.org/faq/?__category=mpi-apps#default-__wrapper-compiler-flags
            
<http://www.open-mpi.org/faq/?category=mpi-apps#default-wrapper-compiler-flags>
                     >>>>
                     >>>> Thanks.
                     >>>>
                     >>>> --Ben L
                     >>>> _________________________________________________
                     >>>> users mailing list
                     >>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
            <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>

                     >>>>
            http://www.open-mpi.org/__mailman/listinfo.cgi/users
            <http://www.open-mpi.org/mailman/listinfo.cgi/users>
                     >>>
                     >>>
                     >>>
                     >>> _________________________________________________
                     >>> users mailing list
                     >>>
                     >>> us...@open-mpi.org <mailto:us...@open-mpi.org>
            <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>

                     >>>
            http://www.open-mpi.org/__mailman/listinfo.cgi/users
            <http://www.open-mpi.org/mailman/listinfo.cgi/users>
                     >>
                     >>
                     >> --
                     >> ------------------------------__---
                     >> Maxime Boissonneault
                     >> Analyste de calcul - Calcul Québec, Université Laval
                     >> Ph. D. en physique
                     >>
                     >> _________________________________________________
                     >> users mailing list
                     >> us...@open-mpi.org <mailto:us...@open-mpi.org>
            <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>

                     >>
            http://www.open-mpi.org/__mailman/listinfo.cgi/users
            <http://www.open-mpi.org/mailman/listinfo.cgi/users>
                     >
                     > _________________________________________________
                     > users mailing list
                     > us...@open-mpi.org <mailto:us...@open-mpi.org>
            <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>

                     >
            http://www.open-mpi.org/__mailman/listinfo.cgi/users
            <http://www.open-mpi.org/mailman/listinfo.cgi/users>


                     --
                     Jeff Squyres
            jsquy...@cisco.com <mailto:jsquy...@cisco.com>
            <mailto:jsquy...@cisco.com <mailto:jsquy...@cisco.com>>

                     For corporate legal information go to:
            http://www.cisco.com/web/__about/doing_business/legal/__cri/
            <http://www.cisco.com/web/about/doing_business/legal/cri/>

                     _________________________________________________
                     users mailing list
            us...@open-mpi.org <mailto:us...@open-mpi.org>
            <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>

            http://www.open-mpi.org/__mailman/listinfo.cgi/users
            <http://www.open-mpi.org/mailman/listinfo.cgi/users>




                 --
                 --Ben L
                 _________________________________________________
                 users mailing list
            us...@open-mpi.org <mailto:us...@open-mpi.org>
            <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>
            http://www.open-mpi.org/__mailman/listinfo.cgi/users
            <http://www.open-mpi.org/mailman/listinfo.cgi/users>



             _________________________________________________
             users mailing list
        us...@open-mpi.org <mailto:us...@open-mpi.org>
        <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>

        http://www.open-mpi.org/__mailman/listinfo.cgi/users
        <http://www.open-mpi.org/mailman/listinfo.cgi/users>




        --
        --Ben L


        _________________________________________________
        users mailing list
        us...@open-mpi.org <mailto:us...@open-mpi.org>
        http://www.open-mpi.org/__mailman/listinfo.cgi/users
        <http://www.open-mpi.org/mailman/listinfo.cgi/users>


    _________________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    http://www.open-mpi.org/__mailman/listinfo.cgi/users
    <http://www.open-mpi.org/mailman/listinfo.cgi/users>




--
--Ben L


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to