Well, this is a little strange. The hanging behavior is gone, but I'm getting a segfault now. The output of "hello_c.c" and "ring_c.c" are attached.
I'm getting a segfault with the Fortran test, also. I'm afraid I may have polluted the experiment by removing the target openmpi-1.6.5 installation directory yesterday. To produce the attached outputs, I just went back and did "make install" in the openmpi-1.6.5 build directory. I've re-set the environment variables as they were a few days ago by sourcing the same bash script. Perhaps I forgot something, or something on the system changed? Regardless, LD_LIBRARY_PATH and PATH are set correctly, and aberrant behavior persists. The reason for deleting the openmpi-1.6.5 installation was that I went back and installed openmpi-1.4.3 and the problem (mostly) went away. Openmpi-1.4.3 can run the simple tests without issue, but on my "real" program, I'm getting symbol lookup errors: mca_paffinity_linux.so: undefined symbol: mca_base_param_reg_int Perhaps that's a separate thread. >-----Original Message----- >From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff >Squyres (jsquyres) >Sent: Tuesday, January 21, 2014 3:57 PM >To: Open MPI Users >Subject: Re: [OMPI users] simple test problem hangs on mpi_finalize and >consumes all system resources > >Just for giggles, can you repeat the same test but with hello_c.c and ring_c.c? >I.e., let's get the Fortran out of the way and use just the base C bindings, >and >see what happens. > > >On Jan 19, 2014, at 6:18 PM, "Fischer, Greg A." <fisch...@westinghouse.com> >wrote: > >> I just tried running "hello_f90.f90" and see the same behavior: 100% CPU >usage, gradually increasing memory consumption, and failure to get past >mpi_finalize. LD_LIBRARY_PATH is set as: >> >> >> /tools/casl_sles10/vera_clean/gcc-4.6.1/toolset/openmpi-1.6.5/lib >> >> The installation target for this version of OpenMPI is: >> >> >> /tools/casl_sles10/vera_clean/gcc-4.6.1/toolset/openmpi-1.6.5 >> >> 1045 >> fischega@lxlogin2[/data/fischega/petsc_configure/mpi_test/simple]> >> which mpirun >> /tools/casl_sles10/vera_clean/gcc-4.6.1/toolset/openmpi-1.6.5/bin/mpir >> un >> >> Perhaps something strange is happening with GCC? I've tried simple hello >world C and Fortran programs, and they work normally. >> >> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph >> Castain >> Sent: Sunday, January 19, 2014 11:36 AM >> To: Open MPI Users >> Subject: Re: [OMPI users] simple test problem hangs on mpi_finalize >> and consumes all system resources >> >> The OFED warning about registration is something OMPI added at one point >when we isolated the cause of jobs occasionally hanging, so you won't see >that warning from other MPIs or earlier versions of OMPI (I forget exactly >when we added it). >> >> The problem you describe doesn't sound like an OMPI issue - it sounds like >you've got a memory corruption problem in the code. Have you tried running >the examples in our example directory to confirm that the installation is >good? >> >> Also, check to ensure that your LD_LIBRARY_PATH is correctly set to pickup >the OMPI libs you installed - most Linux distros come with an older version, >and that can cause problems if you inadvertently pick them up. >> >> >> On Jan 19, 2014, at 5:51 AM, Fischer, Greg A. <fisch...@westinghouse.com> >wrote: >> >> >> Hello, >> >> I have a simple, 1-process test case that gets stuck on the mpi_finalize >> call. >The test case is a dead-simple calculation of pi - 50 lines of Fortran. The >process gradually consumes more and more memory until the system >becomes unresponsive and needs to be rebooted, unless the job is killed >first. >> >> In the output, attached, I see the warning message about OpenFabrics >being configured to only allow registering part of physical memory. I've tried >to chase this down with my administrator to no avail yet. (I am aware of the >relevant FAQ entry.) A different installation of MPI on the same system, >made with a different compiler, does not produce the OpenFabrics memory >registration warning - which seems strange because I thought it was a system >configuration issue independent of MPI. Also curious in the output is that LSF >seems to think there are 7 processes and 11 threads associated with this job. >> >> The particulars of my configuration are attached and detailed below. Does >anyone see anything potentially problematic? >> >> Thanks, >> Greg >> >> OpenMPI Version: 1.6.5 >> Compiler: GCC 4.6.1 >> OS: SuSE Linux Enterprise Server 10, Patchlevel 2 >> >> uname -a : Linux lxlogin2 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 >> UTC 2008 x86_64 x86_64 x86_64 GNU/Linux >> >> LD_LIBRARY_PATH=/tools/casl_sles10/vera_clean/gcc- >4.6.1/toolset/openmp >> i-1.6.5/lib:/tools/casl_sles10/vera_clean/gcc-4.6.1/toolset/gcc-4.6.1/ >> lib64:/tools/lsf/7.0.6.EC/7.0/linux2.6-glibc2.3-x86_64/lib >> >> PATH= >> /tools/casl_sles10/vera_clean/gcc-4.6.1/toolset/python-2.7.6/bin:/tool >> s/casl_sles10/vera_clean/gcc-4.6.1/toolset/openmpi-1.6.5/bin:/tools/ca >> sl_sles10/vera_clean/gcc-4.6.1/toolset/gcc-4.6.1/bin:/tools/casl_sles1 >> 0/vera_clean/gcc-4.6.1/toolset/git-1.7.0.4/bin:/tools/casl_sles10/vera >> _clean/gcc-4.6.1/toolset/cmake-2.8.11.2/bin:/tools/lsf/7.0.6.EC/7.0/li >> nux2.6-glibc2.3-x86_64/etc:/tools/lsf/7.0.6.EC/7.0/linux2.6-glibc2.3-x >> 86_64/bin:/usr/bin:.:/bin:/usr/scripts >> >> Execution command: (executed via LSF - effectively "mpirun -np 1 >> test_program") >> ><output.txt><config.log.bz2><ompi_info.bz2>___________________________ >> ____________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > >-- >Jeff Squyres >jsquy...@cisco.com >For corporate legal information go to: >http://www.cisco.com/web/about/doing_business/legal/cri/ > >_______________________________________________ >users mailing list >us...@open-mpi.org >http://www.open-mpi.org/mailman/listinfo.cgi/users >
hello.out
Description: hello.out
ring.out
Description: ring.out