Hello, all! Just installed Valgrind (since this seems like a memory issue) and got this interesting output (when running the test program):
==4616== Syscall param sched_setaffinity(mask) points to unaddressable byte(s) ==4616== at 0x43656BD: syscall (in /lib/tls/libc-2.3.2.so) ==4616== by 0x4236A75: opal_paffinity_linux_plpa_init (plpa_runtime.c:37) ==4616== by 0x423779B: opal_paffinity_linux_plpa_have_topology_information (plpa_map.c:501) ==4616== by 0x4235FEE: linux_module_init (paffinity_linux_module.c:119) ==4616== by 0x447F114: opal_paffinity_base_select (paffinity_base_select.c:64) ==4616== by 0x444CD71: opal_init (opal_init.c:292) ==4616== by 0x43CE7E6: orte_init (orte_init.c:76) ==4616== by 0x4067A50: ompi_mpi_init (ompi_mpi_init.c:342) ==4616== by 0x40A3444: PMPI_Init (pinit.c:80) ==4616== by 0x804875C: main (test.cpp:17) ==4616== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==4616== ==4616== Invalid read of size 4 ==4616== at 0x4095772: ompi_comm_invalid (communicator.h:261) ==4616== by 0x409581E: PMPI_Comm_size (pcomm_size.c:46) ==4616== by 0x8048770: main (test.cpp:18) ==4616== Address 0x440000a0 is not stack'd, malloc'd or (recently) free'd [denali:04616] *** Process received signal *** [denali:04616] Signal: Segmentation fault (11) [denali:04616] Signal code: Address not mapped (1) [denali:04616] Failing at address: 0x440000a0 [denali:04616] [ 0] /lib/tls/libc.so.6 [0x42b4de0] [denali:04616] [ 1] /users/cluster/cdavid/local/lib/libmpi.so.0(MPI_Comm_size+0x6f) [0x409581f] [denali:04616] [ 2] ./test(__gxx_personality_v0+0x12d) [0x8048771] [denali:04616] [ 3] /lib/tls/libc.so.6(__libc_start_main+0xf8) [0x42a2768] [denali:04616] [ 4] ./test(__gxx_personality_v0+0x3d) [0x8048681] [denali:04616] *** End of error message *** ==4616== ==4616== Invalid read of size 4 ==4616== at 0x4095782: ompi_comm_invalid (communicator.h:261) ==4616== by 0x409581E: PMPI_Comm_size (pcomm_size.c:46) ==4616== by 0x8048770: main (test.cpp:18) ==4616== Address 0x440000a0 is not stack'd, malloc'd or (recently) free'd The problem is that, now, I don't know where the issue comes from (is it libc that is too old and incompatible with g++ 4.4/OpenMPI? is libc broken?). Any help would be highly appreciated. Thanks, Catalin On Mon, Jul 6, 2009 at 3:36 PM, Catalin David<catalindavid2...@gmail.com> wrote: > On Mon, Jul 6, 2009 at 3:26 PM, jody<jody....@gmail.com> wrote: >> Hi >> Are you also sure that you have the same version of Open-MPI >> on every machine of your cluster, and that it is the mpicxx of this >> version that is called when you run your program? >> I ask because you mentioned that there was an old version of Open-MPI >> present... die you remove this? >> >> Jody > > Hi > > I have just logged in a few other boxes and they all mount my home > folder. When running `echo $LD_LIBRARY_PATH` and other commands, I get > what I expect to get, but this might be because I have set these > variables in the .bashrc file. So, I tried compiling/running like this > ~/local/bin/mpicxx [stuff] and ~/local/bin/mpirun -np 4 ray-trace, > but I get the same errors. > > As for the previous version, I don't have root access, therefore I was > not able to remove it. I was just trying to outrun it by setting the > $PATH variable to point first at my local installation. > > > Catalin > > > -- > > ****************************** > Catalin David > B.Sc. Computer Science 2010 > Jacobs University Bremen > > Phone: +49-(0)1577-49-38-667 > > College Ring 4, #343 > Bremen, 28759 > Germany > ****************************** > -- ****************************** Catalin David B.Sc. Computer Science 2010 Jacobs University Bremen Phone: +49-(0)1577-49-38-667 College Ring 4, #343 Bremen, 28759 Germany ******************************