Hi,

As the subject suggests, I have a simulation that runs correctly on my iMac,
but fails when I try to run it on a cluster, and I am hoping someone may be
able to suggest which things to try first to resolve the issue.

Background:
The simulation proceeds perfectly well on the iMac (OS X 10.5) without
error/warning. On the cluster, it begins producing multiple LINCS warnings
at step 14555 (of 7500000) and then segfaults after step 14556 with:

[node-005:13244] *** Process received signal ***
[node-005:13244] Signal: Segmentation fault (11)
[node-005:13244] Signal code: Address not mapped (1)
[node-005:13244] Failing at address: 0x2aaab1380520
[node-005:13244] [ 0] /lib64/libpthread.so.0 [0x2aaaac402b10]
[node-005:13244] [ 1] mdrun_mpi(nb_kernel410_x86_64_sse+0xa65) [0x947e25]
[node-005:13244] [ 2] mdrun_mpi(do_nonbonded+0x780) [0x8ce890]
[node-005:13244] [ 3] mdrun_mpi(do_force_lowlevel+0x308) [0x6842b8]
[node-005:13244] [ 4] mdrun_mpi(do_force+0xc59) [0x6f7c19]
[node-005:13244] [ 5] mdrun_mpi(do_md+0x5785) [0x626f75]
[node-005:13244] [ 6] mdrun_mpi(mdrunner+0xa07) [0x61e8a7]
[node-005:13244] [ 7] mdrun_mpi(main+0x1363) [0x62c5f3]
[node-005:13244] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x2aaaac62d994]
[node-005:13244] [ 9] mdrun_mpi(__gxx_personality_v0+0x479) [0x44b659]
[node-005:13244] *** End of error message ***

Things I have tried:
* Both MPI and non-MPI versions on cluster (same result)
* Harmonising FFTW - configured and compiled fftw3 from same source using
same configuration and ensured correct library was included during configure
step
* Checking the Reproducibility documentation
* Searching the archives - I didn't find anything that described a similar
problem.

Things I think may be involved:
* Different architectures - i686 vs x86_64 - don't know how to test for this
* Different BLAS/LAPACK libraries - I believe gromacs uses the vecLb on OS
X; maybe I could compile without external BLAS/LAPACK and see if this makes
a difference
* Some other unknown problem

I've currently spent more than 2 weeks trying to diagnose this problem and
don't seem to be making progress. Could anyone suggest what is the most
likely cause of this significant difference in output, and what I could do
to test/fix it?

Any help is greatly appreciated.

Luke

Attachment: configure-cluster.log
Description: Binary data

Attachment: configure-imac.log
Description: Binary data

-- 
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

Reply via email to