Hi all,
  I've been beating my head on this for quite a while now.  I don't have
this problem when running with 1,2, or 3 procs, however, once I get to 4 or
beyond, I have a problem.

When I call malloc, I get this error:

txserver:11055 terminated with signal 11 at PC=2b46886bc18a
SP=7fff20f51030.  Backtrace:
/apps/x86_64/mpi/openmpi/gcc-4.3.4/openmpi-1.4.3_oobpr/lib/libopen-pal.so.0(
*opal_memory_ptmalloc2_int_malloc+0x54a*)[0x2b46886bc18a]
/apps/x86_64/mpi/openmpi/gcc-4.3.4/openmpi-1.4.3_oobpr/lib/libopen-pal.so.0[0x2b46886bd4f3]
txserver[0x415769]
txserver[0x40da8c]
txserver[0x4344bb]
txserver[0x4351cd]
txserver[0x40e3d4]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x2b468a5af994]
txserver(_ZNSt8ios_base4InitD1Ev+0x39)[0x40c889]

However, only rank 0 calls this function.  Which is strange.  I can just put
a dummy malloc in there (int * dummy = (int *)malloc(10);) for example, and
it will still crash.

Again, this does not happen with n < 4 procs.  The crash happens on rank 0,
as it's the only rank that calls this code...

I'm perplexed.

Thanks a lot,
J.D.

Reply via email to