I know why it quite - M3EXIT was called - but thanks for looking.
On Wed, May 21, 2014 at 4:02 PM, Gus Correa <g...@ldeo.columbia.edu> wrote: > Hi Ben > > One of the ranks (52) called MPI_Abort. > This may be a bug in the code, or a problem with the setup > (e.g. a missing or incorrect input file). > For instance, the CCTM Wiki says: > "AERO6 expects emissions inputs for 13 new PM species. CCTM will crash if > any emitted PM species is not included in the emissions input file" > I am not familiar to CCTM, so these are just guesses. > > It doesn't look like an MPI problem, though. > > You may want to check any other logs that the CCTM code may > produce, for any clue on where it fails. > Otherwise, you could compile with -g -traceback (and remove any > optimization options in FFLAGS, FCFLAGS, CFLAGS, etc.) > It may also have a -DDEBUG or similar that can be turned on > in the CPPFLAGS, which in many models makes a more verbose log. > This *may* tell you where it fails (source file, subroutine and line), > and may help understand why it fails. > If it dumps a core file, you can trace the failure point with > a debugger. > > > I hope this helps, > Gus > > On 05/21/2014 03:20 PM, Ben Lash wrote: > >> I used a different build of netcdf 4.1.3, and the code seems to run now. >> I have a totally different, non-mpi related error in part of it, but >> there's no way for the list to help, I mostly just wanted to report that >> this particular problem seems to be solved for the record. It doesn't >> seem to fail quite as gracefully anymore, but I'm still getting enough >> of the error messages to know what's going on. >> >> MPI_ABORT was invoked on rank 52 in communicator MPI_COMM_WORLD >> with errorcode 0. >> >> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >> You may or may not see output from other processes, depending on >> exactly when Open MPI kills them. >> ------------------------------------------------------------ >> -------------- >> [cn-099.davinci.rice.edu:26185 <http://cn-099.davinci.rice.edu:26185>] >> >> [[63355,0],4]-[[63355,1],52] mca_oob_tcp_msg_recv: readv failed: >> Connection reset by peer (104) >> [cn-099.davinci.rice.edu:26185 <http://cn-099.davinci.rice.edu:26185>] >> >> [[63355,0],4]-[[63355,1],54] mca_oob_tcp_msg_recv: readv failed: >> Connection reset by peer (104) >> [cn-099.davinci.rice.edu:26185 <http://cn-099.davinci.rice.edu:26185>] >> >> [[63355,0],4]-[[63355,1],55] mca_oob_tcp_msg_recv: readv failed: >> Connection reset by peer (104) >> [cn-158.davinci.rice.edu:12459 <http://cn-158.davinci.rice.edu:12459>] >> >> [[63355,0],1]-[[63355,1],15] mca_oob_tcp_msg_recv: readv failed: >> Connection reset by peer (104) >> [cn-158.davinci.rice.edu:12459 <http://cn-158.davinci.rice.edu:12459>] >> >> [[63355,0],1]-[[63355,1],17] mca_oob_tcp_msg_recv: readv failed: >> Connection reset by peer (104) >> [cn-099.davinci.rice.edu:26185 <http://cn-099.davinci.rice.edu:26185>] >> >> [[63355,0],4]-[[63355,1],56] mca_oob_tcp_msg_recv: readv failed: >> Connection reset by peer (104) >> [cn-099.davinci.rice.edu:26185 <http://cn-099.davinci.rice.edu:26185>] >> >> [[63355,0],4]-[[63355,1],53] mca_oob_tcp_msg_recv: readv failed: >> Connection reset by peer (104) >> [cn-099.davinci.rice.edu:26185 <http://cn-099.davinci.rice.edu:26185>] >> >> [[63355,0],4]-[[63355,1],51] mca_oob_tcp_msg_recv: readv failed: >> Connection reset by peer (104) >> [cn-099.davinci.rice.edu:26185 <http://cn-099.davinci.rice.edu:26185>] >> >> [[63355,0],4]-[[63355,1],57] mca_oob_tcp_msg_recv: readv failed: >> Connection reset by peer (104) >> forrtl: error (78): process killed (SIGTERM) >> Image PC Routine Line Source >> >> .... >> >> [cn-158.davinci.rice.edu:12459 <http://cn-158.davinci.rice.edu:12459>] >> >> [[63355,0],1]-[[63355,1],16] mca_oob_tcp_msg_recv: readv failed: >> Connection reset by peer (104) >> ------------------------------------------------------------ >> -------------- >> mpiexec has exited due to process rank 49 with PID 26187 on >> node cn-099 exiting improperly. There are two reasons this could occur: >> >> 1. this process did not call "init" before exiting, but others in >> the job did. This can cause a job to hang indefinitely while it waits >> for all processes to call "init". By rule, if one process calls "init", >> then ALL processes must call "init" prior to termination. >> >> 2. this process called "init", but exited without calling "finalize". >> By rule, all processes that call "init" MUST call "finalize" prior to >> exiting or it will be considered an "abnormal termination" >> >> This may have caused other processes in the application to be >> terminated by signals sent by mpiexec (as reported here). >> ------------------------------------------------------------ >> -------------- >> forrtl: error (78): process killed (SIGTERM) >> Image PC Routine Line Source >> CCTM_V5g_Linux2_x 00000000007FEA29 Unknown Unknown >> Unknown >> CCTM_V5g_Linux2_x 00000000007FD3A0 Unknown Unknown >> Unknown >> CCTM_V5g_Linux2_x 00000000007BA9A2 Unknown Unknown >> Unknown >> CCTM_V5g_Linux2_x 0000000000759288 Unknown Unknown >> Unknown >> >> ... >> >> >> >> On Wed, May 21, 2014 at 2:08 PM, Gus Correa <g...@ldeo.columbia.edu >> <mailto:g...@ldeo.columbia.edu>> wrote: >> >> Hi Ben >> >> My guess is that your sys admins may have built NetCDF >> with parallel support, pnetcdf, and the latter with OpenMPI, >> which could explain the dependency. >> Ideally, they should have built it again with the latest default >> OpenMPI (1.6.5?) >> >> Check if there is a NetCDF module that either doesn't have any >> dependence on MPI, or depends on the current Open MPI that >> you are using (1.6.5 I think). >> A 'module show netcdf/bla/bla' >> on the available netcdf modules will tell. >> >> If the application code is old as you said, it probably doesn't use >> any pnetcdf. In addition, it should work even with NetCDF 3.X.Y, >> which probably doesn't have any pnetcdf built in. >> Newer netcdf (4.Z.W > 4.1.3) should also work, and in this case >> pick one that requires the default OpenMPI, if available. >> >> Just out of curiosity, besides netcdf/4.1.3, did you load >> openmpi/1.6.5? >> Somehow the openmpi/1.6.5 should have been marked >> to conflict with 1.4.4. >> Is it? >> Anyway, you may want to do a 'which mpiexec' to see which one is >> taking precedence in your environment (1.6.5 or 1.4.4) >> Probably 1.6.5. >> >> Does the code work now, or does it continue to fail? >> >> >> I hope this helps, >> Gus Correa >> >> >> >> On 05/21/2014 02:36 PM, Ben Lash wrote: >> >> Yep, there is is. >> >> [bl10@login2 USlogsminus10]$ module show netcdf/4.1.3 >> ------------------------------__---------------------------- >> --__------- >> /opt/apps/modulefiles/netcdf/__4.1.3: >> >> >> module load openmpi/1.4.4-intel >> prepend-path PATH >> /opt/apps/netcdf/4.1.3/bin:/__opt/apps/netcdf/4.1.3/deps/__ >> hdf5/1.8.7/bin >> prepend-path LD_LIBRARY_PATH >> /opt/apps/netcdf/4.1.3/lib:/__opt/apps/netcdf/4.1.3/deps/__ >> hdf5/1.8.7/lib:/opt/apps/__netcdf/4.1.3/deps/szip/2.1/lib >> >> prepend-path MANPATH /opt/apps/netcdf/4.1.3/share/__man >> ------------------------------__---------------------------- >> --__------- >> >> >> >> >> On Wed, May 21, 2014 at 1:34 PM, Douglas L Reeder >> <d...@centurylink.net <mailto:d...@centurylink.net> >> <mailto:d...@centurylink.net <mailto:d...@centurylink.net>>> >> wrote: >> >> Ben, >> >> The netcdf/4.1.3 module maybe loading the openmpi/1.4.4 >> module. Can >> you do module show the netcdf module file to to see if >> there is a >> module load openmpi command. >> >> Doug Reeder >> >> On May 21, 2014, at 12:23 PM, Ben Lash <b...@rice.edu >> <mailto:b...@rice.edu> >> <mailto:b...@rice.edu <mailto:b...@rice.edu>>> wrote: >> >> I just wanted to follow up for anyone else who got a >> similar >> problem - module load netcdf/4.1.3 *also* loaded >> openmpi/1.4.4. <http://1.4.4.> >> <http://1.4.4./> Don't ask me why. My code doesn't seem >> to fail as >> >> gracefully but otherwise works now. Thanks. >> >> >> On Sat, May 17, 2014 at 6:02 AM, Jeff Squyres (jsquyres) >> <jsquy...@cisco.com <mailto:jsquy...@cisco.com> >> <mailto:jsquy...@cisco.com <mailto:jsquy...@cisco.com>>> >> wrote: >> >> Ditto -- Lmod looks pretty cool. Thanks for the >> heads up. >> >> >> On May 16, 2014, at 6:23 PM, Douglas L Reeder >> <d...@centurylink.net <mailto:d...@centurylink.net> >> <mailto:d...@centurylink.net <mailto:d...@centurylink.net>>> >> >> wrote: >> >> > Maxime, >> > >> > I was unaware of Lmod. Thanks for bringing it to >> my attention. >> > >> > Doug >> > On May 16, 2014, at 4:07 PM, Maxime Boissonneault >> <maxime.boissonneault@__calculquebec.ca >> <mailto:maxime.boissonnea...@calculquebec.ca> >> <mailto:maxime.boissonneault@__calculquebec.ca >> <mailto:maxime.boissonnea...@calculquebec.ca>>> wrote: >> > >> >> Instead of using the outdated and not maintained >> Module >> environment, why not use Lmod : >> https://www.tacc.utexas.edu/__tacc-projects/lmod >> >> <https://www.tacc.utexas.edu/tacc-projects/lmod> >> >> >> >> It is a drop-in replacement for Module >> environment that >> supports all of their features and much, much more, >> such as : >> >> - module hierarchies >> >> - module properties and color highlighting (we >> use it to >> higlight bioinformatic modules or tools for example) >> >> - module caching (very useful for a parallel >> filesystem >> with tons of modules) >> >> - path priorities (useful to make sure personal >> modules >> take precendence over system modules) >> >> - export module tree to json >> >> >> >> It works like a charm, understand both TCL and >> Lua modules >> and is actively developped and debugged. There are >> litteraly >> new features every month or so. If it does not do >> what you >> want, odds are that the developper will add it >> shortly (I've >> had it happen). >> >> >> >> Maxime >> >> >> >> Le 2014-05-16 17:58, Douglas L Reeder a écrit : >> >>> Ben, >> >>> >> >>> You might want to use module (source forge) to >> manage >> paths to different mpi implementations. It is >> fairly easy to >> set up and very robust for this type of problem. >> You would >> remove contentious application paths from you >> standard PATH >> and then use module to switch them in and out as >> needed. >> >>> >> >>> Doug Reeder >> >>> On May 16, 2014, at 3:39 PM, Ben Lash >> <b...@rice.edu <mailto:b...@rice.edu> >> <mailto:b...@rice.edu <mailto:b...@rice.edu>>> >> wrote: >> >>> >> >>>> My cluster has just upgraded to a new version >> of MPI, and >> I'm using an old one. It seems that I'm having >> trouble >> compiling due to the compiler wrapper file moving >> (full error >> here: http://pastebin.com/EmwRvCd9) >> >>>> "Cannot open configuration file >> >> /opt/apps/openmpi/1.4.4-intel/__share/openmpi/mpif90- >> wrapper-__data.txt" >> >> >>>> >> >>>> I've found the file on the cluster at >> >> /opt/apps/openmpi/retired/1.4.__4-intel/share/openmpi/ >> mpif90-__wrapper-data.txt >> >> >>>> How do I tell the old mpi wrapper where this >> file is? >> >>>> I've already corrected one link to mpich -> >> /opt/apps/openmpi/retired/1.4.__4-intel/, which is >> >> in the >> software I'm trying to recompile's lib folder >> (/home/bl10/CMAQv5.0.1/lib/__x86_64/ifort). Thanks >> >> for any >> ideas. I also tried changing $pkgdatadir based on >> what I read >> here: >> >>>> >> http://www.open-mpi.org/faq/?__category=mpi-apps#default-__ >> wrapper-compiler-flags >> >> <http://www.open-mpi.org/faq/?category=mpi-apps#default- >> wrapper-compiler-flags> >> >>>> >> >>>> Thanks. >> >>>> >> >>>> --Ben L >> >>>> ______________________________ >> ___________________ >> >> >>>> users mailing list >> >>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >> <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>> >> >> >>>> >> http://www.open-mpi.org/__mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> >>> >> >>> >> >>> >> >>> ______________________________ >> ___________________ >> >>> users mailing list >> >>> >> >>> us...@open-mpi.org <mailto:us...@open-mpi.org> >> <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>> >> >> >>> >> http://www.open-mpi.org/__mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> >> >> >> >> >> -- >> >> ------------------------------__--- >> >> Maxime Boissonneault >> >> Analyste de calcul - Calcul Québec, Université >> Laval >> >> Ph. D. en physique >> >> >> >> _________________________________________________ >> >> >> users mailing list >> >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>> >> >> >> >> http://www.open-mpi.org/__mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> > >> > _________________________________________________ >> > users mailing list >> > us...@open-mpi.org <mailto:us...@open-mpi.org> >> <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>> >> >> > >> http://www.open-mpi.org/__mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com <mailto:jsquy...@cisco.com> >> <mailto:jsquy...@cisco.com <mailto:jsquy...@cisco.com>> >> >> >> For corporate legal information go to: >> http://www.cisco.com/web/__about/doing_business/legal/__cri/ >> <http://www.cisco.com/web/about/doing_business/legal/cri/> >> >> _________________________________________________ >> >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>> >> >> http://www.open-mpi.org/__mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> >> >> >> >> -- >> --Ben L >> _________________________________________________ >> >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>> >> http://www.open-mpi.org/__mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> >> >> >> _________________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>> >> >> http://www.open-mpi.org/__mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> >> >> >> >> -- >> --Ben L >> >> >> _________________________________________________ >> >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> http://www.open-mpi.org/__mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> >> >> _________________________________________________ >> >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> http://www.open-mpi.org/__mailman/listinfo.cgi/users >> >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> >> >> >> >> -- >> --Ben L >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > -- --Ben L