Gadget2 - I cannot attach it because it is not publicly available, runs perfectly fine on any number of processes on systems such as Solaris 10 - Sun CT6 gigabit, SUN CT5 and myrinet gm, IBM regatta ..
Sorry to be so expansive ... When I run the code on 32 CPUs on openmpi, mx using the studio11 compilers on a solaris x64 system the code works fine, until about the end, when it fails to write all the restart files. When I run the code on 64 CPUs it fails with an error message which is Topnodes=218193 costlimit=0.0890015 countlimit=428.229 Before=44417 After=46281 NTopleaves= 40496 NTopnodes=46281 (space for 347252) desired memory imbalance=2.83425 (limit=100719, needed=114185) Note: the domain decomposition is suboptimum because the ceiling for memory-imbalance is reached work-load balance=1.28529 memory-balance=1.01948 exchange of 0002589387 particles Signal:11 info.si_errno:0(Error 0) si_code:1(SEGV_MAPERR) Failing at addr:5192cbd0 /opt/ompi/lib/libopal.so.0.0.0:opal_backtrace_print+0x10 /opt/ompi/lib/libopal.so.0.0.0:0x99df5 /lib/amd64/libc.so.1:0xcb276 /lib/amd64/libc.so.1:0xc0642 /opt/mx/lib/amd64/libmyriexpress.so:mx__luigi+0xd5 [ Signal 11 (SEGV)] /opt/mx/lib/amd64/libmyriexpress.so:mx_irecv+0x174 /opt/ompi/lib/openmpi/mca_mtl_mx.so:ompi_mtl_mx_irecv+0x116 /opt/ompi/lib/openmpi/mca_pml_cm.so:mca_pml_cm_irecv+0x27b /opt/ompi/lib/libmpi.so.0.0.0:PMPI_Irecv+0x1ae /data/rw9/arj/unpack/bench_test_myri2/Gadget2-multidomain/Gadget2:domain_exchange+0x11b7 /data/rw9/arj/unpack/bench_test_myri2/Gadget2-multidomain/Gadget2:domain_decompose+0x4da /data/rw9/arj/unpack/bench_test_myri2/Gadget2-multidomain/Gadget2:domain_Decomposition+0x467 /data/rw9/arj/unpack/bench_test_myri2/Gadget2-multidomain/Gadget2:run+0x9f /data/rw9/arj/unpack/bench_test_myri2/Gadget2-multidomain/Gadget2:main+0x191 /data/rw9/arj/unpack/bench_test_myri2/Gadget2-multidomain/Gadget2:0x69fc *** End of error message *** 63 additional processes aborted (not shown) m2001(26) > /opt/ompi/bin/mpirun -np 32 -machinefile ./myh-all -mca pml cm ./Gadget2 param.txt As this is one of our predominant production codes, I need to make sure that it is running on any system which I install. Any idea would be welcome. Lydia ------------------------------------------ Dr E L Heck University of Durham Institute for Computational Cosmology Ogden Centre Department of Physics South Road DURHAM, DH1 3LE United Kingdom e-mail: lydia.h...@durham.ac.uk Tel.: + 44 191 - 334 3628 Fax.: + 44 191 - 334 3645 ___________________________________________