Catalin David wrote:
Hello, all!

Just installed Valgrind (since this seems like a memory issue) and got
this interesting output (when running the test program):

==4616== Syscall param sched_setaffinity(mask) points to unaddressable byte(s)
==4616==    at 0x43656BD: syscall (in /lib/tls/libc-2.3.2.so)
==4616==    by 0x4236A75: opal_paffinity_linux_plpa_init (plpa_runtime.c:37)
==4616==    by 0x423779B:
opal_paffinity_linux_plpa_have_topology_information (plpa_map.c:501)
==4616==    by 0x4235FEE: linux_module_init (paffinity_linux_module.c:119)
==4616==    by 0x447F114: opal_paffinity_base_select
(paffinity_base_select.c:64)
==4616==    by 0x444CD71: opal_init (opal_init.c:292)
==4616==    by 0x43CE7E6: orte_init (orte_init.c:76)
==4616==    by 0x4067A50: ompi_mpi_init (ompi_mpi_init.c:342)
==4616==    by 0x40A3444: PMPI_Init (pinit.c:80)
==4616==    by 0x804875C: main (test.cpp:17)
==4616==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==4616==
==4616== Invalid read of size 4
==4616==    at 0x4095772: ompi_comm_invalid (communicator.h:261)
==4616==    by 0x409581E: PMPI_Comm_size (pcomm_size.c:46)
==4616==    by 0x8048770: main (test.cpp:18)
==4616==  Address 0x440000a0 is not stack'd, malloc'd or (recently) free'd
[denali:04616] *** Process received signal ***
[denali:04616] Signal: Segmentation fault (11)
[denali:04616] Signal code: Address not mapped (1)
[denali:04616] Failing at address: 0x440000a0
[denali:04616] [ 0] /lib/tls/libc.so.6 [0x42b4de0]
[denali:04616] [ 1]
/users/cluster/cdavid/local/lib/libmpi.so.0(MPI_Comm_size+0x6f)
[0x409581f]
[denali:04616] [ 2] ./test(__gxx_personality_v0+0x12d) [0x8048771]
[denali:04616] [ 3] /lib/tls/libc.so.6(__libc_start_main+0xf8) [0x42a2768]
[denali:04616] [ 4] ./test(__gxx_personality_v0+0x3d) [0x8048681]
[denali:04616] *** End of error message ***
==4616==
==4616== Invalid read of size 4
==4616==    at 0x4095782: ompi_comm_invalid (communicator.h:261)
==4616==    by 0x409581E: PMPI_Comm_size (pcomm_size.c:46)
==4616==    by 0x8048770: main (test.cpp:18)
==4616==  Address 0x440000a0 is not stack'd, malloc'd or (recently) free'd


The problem is that, now, I don't know where the issue comes from (is
it libc that is too old and incompatible with g++ 4.4/OpenMPI? is libc
broken?).
Looking at the code for ompi_comm_invalid:

static inline int ompi_comm_invalid(ompi_communicator_t* comm)
{
   if ((NULL == comm) || (MPI_COMM_NULL == comm) ||
       (OMPI_COMM_IS_FREED(comm)) || (OMPI_COMM_IS_INVALID(comm)) )
       return true;
   else
       return false;
}


the interesting point is that (MPI_COMM_NULL == comm) evaluates to false, otherwise the following macros (where the invalid read occurs) would not be evaluated.

The only idea that comes to my mind is that you are mixing MPI versions, but as you said your PATH is fine ?!

Regards,
Dorian



Any help would be highly appreciated.

Thanks,
Catalin


On Mon, Jul 6, 2009 at 3:36 PM, Catalin David<catalindavid2...@gmail.com> wrote:
On Mon, Jul 6, 2009 at 3:26 PM, jody<jody....@gmail.com> wrote:
Hi
Are you also sure that you have the same version of Open-MPI
on every machine of your cluster, and that it is the mpicxx of this
version that is called when you run your program?
I ask because you mentioned that there was an old version of Open-MPI
present... die you remove this?

Jody
Hi

I have just logged in a few other boxes and they all mount my home
folder. When running `echo $LD_LIBRARY_PATH` and other commands, I get
what I expect to get, but this might be because I have set these
variables in the .bashrc file. So, I tried compiling/running like this
 ~/local/bin/mpicxx [stuff] and ~/local/bin/mpirun -np 4 ray-trace,
but I get the same errors.

As for the previous version, I don't have root access, therefore I was
not able to remove it. I was just trying to outrun it by setting the
$PATH variable to point first at my local installation.


Catalin


--

******************************
Catalin David
B.Sc. Computer Science 2010
Jacobs University Bremen

Phone: +49-(0)1577-49-38-667

College Ring 4, #343
Bremen, 28759
Germany
******************************





Reply via email to