it was compiled with the same ompi. We see it occasionally on different clusters with different ompi folders. (all v1.5)
On Thu, Jan 19, 2012 at 5:44 PM, Ralph Castain <r...@open-mpi.org> wrote: > I didn't commit anything to the v1.5 branch yesterday - just the trunk. > > As I told Mike off-list, I think it may have been that the binary was > compiled against a different OMPI version by mistake. It looks very much > like what I'd expect to have happen in that scenario. > > On Jan 19, 2012, at 7:52 AM, Jeff Squyres wrote: > > > Did you "svn up"? I ask because Ralph committed some stuff yesterday > that may have fixed this. > > > > > > On Jan 18, 2012, at 5:19 PM, Andrew Senin wrote: > > > >> No, nothing specific. Only basic settings (--mca btl openib,self > >> --npernode 1, etc). > >> > >> Actually I'm were confused with this error because today it just > >> disapeared. I had 2 separate folders where it was reproduced in 100% > >> of test runs. Today I recompiled the source and it is gone in both > >> folders. But yesterday I tried recompiling multiple times with no > >> effect. So I believe this must be somehow related to some unknown > >> settings in the lab which have been changed. Trying to reproduce the > >> crash now... > >> > >> Regards, > >> Andrew Senin. > >> > >> On Thu, Jan 19, 2012 at 12:05 AM, Jeff Squyres <jsquy...@cisco.com> > wrote: > >>> Jumping in pretty late in this thread here... > >>> > >>> I see that it's failing in opal_hwloc_base_close(). That's a little > worrysome. > >>> > >>> I do see an odd path through the hwloc initialization that *could* > cause an error during finalization -- but it would involve you setting an > invalid value for an MCA parameter. Are you setting > hwloc_base_mem_bind_failure_action or > >>> hwloc_base_mem_alloc_policy, perchance? > >>> > >>> > >>> On Jan 16, 2012, at 1:56 PM, Andrew Senin wrote: > >>> > >>>> Hi, > >>>> > >>>> I think I've found a bug in the hear revision of the OpenMPI 1.5 > >>>> branch. If it is configured with --disable-debug it crashes in > >>>> finalize on the hello_c.c example. Did I miss something out? > >>>> > >>>> Configure options: > >>>> ./configure --with-pmi=/usr/ --with-slurm=/usr/ --without-psm > >>>> --disable-debug --enable-mpirun-prefix-by-default > >>>> > --prefix=/hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install > >>>> > >>>> Runtime command and output: > >>>> LD_LIBRARY_PATH=$LD_LIBRARY_PATH:../lib ./mpirun --mca btl openib,self > >>>> --npernode 1 --host mir1,mir2 ./hello > >>>> > >>>> Hello, world, I am 0 of 2 > >>>> Hello, world, I am 1 of 2 > >>>> [mir1:05542] *** Process received signal *** > >>>> [mir1:05542] Signal: Segmentation fault (11) > >>>> [mir1:05542] Signal code: Address not mapped (1) > >>>> [mir1:05542] Failing at address: 0xe8 > >>>> [mir2:10218] *** Process received signal *** > >>>> [mir2:10218] Signal: Segmentation fault (11) > >>>> [mir2:10218] Signal code: Address not mapped (1) > >>>> [mir2:10218] Failing at address: 0xe8 > >>>> [mir1:05542] [ 0] /lib64/libpthread.so.0() [0x390d20f4c0] > >>>> [mir1:05542] [ 1] > >>>> > /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(+0x1346a8) > >>>> [0x7f4588cee6a8] > >>>> [mir1:05542] [ 2] > >>>> > /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(opal_hwloc_base_close+0x32) > >>>> [0x7f4588cee700] > >>>> [mir1:05542] [ 3] > >>>> > /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(opal_finalize+0x73) > >>>> [0x7f4588d1beb2] > >>>> [mir1:05542] [ 4] > >>>> > /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(orte_finalize+0xfe) > >>>> [0x7f4588c81eb5] > >>>> [mir1:05542] [ 5] > >>>> > /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(ompi_mpi_finalize+0x67a) > >>>> [0x7f4588c217c3] > >>>> [mir1:05542] [ 6] > >>>> > /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(PMPI_Finalize+0x59) > >>>> [0x7f4588c39959] > >>>> [mir1:05542] [ 7] ./hello(main+0x69) [0x4008fd] > >>>> [mir1:05542] [ 8] /lib64/libc.so.6(__libc_start_main+0xfd) > [0x390ca1ec5d] > >>>> [mir1:05542] [ 9] ./hello() [0x4007d9] > >>>> [mir1:05542] *** End of error message *** > >>>> [mir2:10218] [ 0] /lib64/libpthread.so.0() [0x3a6dc0f4c0] > >>>> [mir2:10218] [ 1] > >>>> > /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(+0x1346a8) > >>>> [0x7f409f31d6a8] > >>>> [mir2:10218] [ 2] > >>>> > /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(opal_hwloc_base_close+0x32) > >>>> [0x7f409f31d700] > >>>> [mir2:10218] [ 3] > >>>> > /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(opal_finalize+0x73) > >>>> [0x7f409f34aeb2] > >>>> [mir2:10218] [ 4] > >>>> > /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(orte_finalize+0xfe) > >>>> [0x7f409f2b0eb5] > >>>> [mir2:10218] [ 5] > >>>> > /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(ompi_mpi_finalize+0x67a) > >>>> [0x7f409f2507c3] > >>>> [mir2:10218] [ 6] > >>>> > /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(PMPI_Finalize+0x59) > >>>> [0x7f409f268959] > >>>> [mir2:10218] [ 7] ./hello(main+0x69) [0x4008fd] > >>>> [mir2:10218] [ 8] /lib64/libc.so.6(__libc_start_main+0xfd) > [0x3a6d41ec5d] > >>>> [mir2:10218] [ 9] ./hello() [0x4007d9] > >>>> [mir2:10218] *** End of error message *** > >>>> > -------------------------------------------------------------------------- > >>>> mpirun noticed that process rank 0 with PID 5542 on node mir1 exited > >>>> on signal 11 (Segmentation fault). > >>>> --------------------------------------------------------------------- > >>>> > >>>> Thanks, > >>>> Andrew Senin > >>>> _______________________________________________ > >>>> users mailing list > >>>> us...@open-mpi.org > >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>> > >>> > >>> -- > >>> Jeff Squyres > >>> jsquy...@cisco.com > >>> For corporate legal information go to: > >>> http://www.cisco.com/web/about/doing_business/legal/cri/ > >>> > >>> > >>> _______________________________________________ > >>> users mailing list > >>> us...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > -- > > Jeff Squyres > > jsquy...@cisco.com > > For corporate legal information go to: > > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >