Hi Luck,

Could you give all the necessary information about your system to help us to 
figure where the problem could be?

What kind of compounds are you simulating?
What size of box are you using?
Do  you run on multiple thread when you run it on your iMac?
How many CPU's are you using on the cluster?

Cheers,
Emanuel






=========================================================
Emanuel Birru
PhD Candidate

Faculty of Pharmacy and Pharmaceutical Sciences
Monash University (Parkville Campus)
381 Royal Parade, Parkville
Victoria 3052, Australia

Tel: Int + 61 3 9903 9187
E-mail: emanuel.bi...@monash.edu<mailto:firstname.lastn...@monash.edu>
www.pharm.monash.edu.au<http://www.pharm.monash.edu.au>

From: gmx-users-boun...@gromacs.org [mailto:gmx-users-boun...@gromacs.org] On 
Behalf Of Luke Goodsell
Sent: Wednesday, 13 July 2011 5:36 PM
To: GROMACS Users mailinglist
Subject: [gmx-users] Simulation runs on iMac but explodes on cluster

Hi,

As the subject suggests, I have a simulation that runs correctly on my iMac, 
but fails when I try to run it on a cluster, and I am hoping someone may be 
able to suggest which things to try first to resolve the issue.

Background:
The simulation proceeds perfectly well on the iMac (OS X 10.5) without 
error/warning. On the cluster, it begins producing multiple LINCS warnings at 
step 14555 (of 7500000) and then segfaults after step 14556 with:

[node-005:13244] *** Process received signal ***
[node-005:13244] Signal: Segmentation fault (11)
[node-005:13244] Signal code: Address not mapped (1)
[node-005:13244] Failing at address: 0x2aaab1380520
[node-005:13244] [ 0] /lib64/libpthread.so.0 [0x2aaaac402b10]
[node-005:13244] [ 1] mdrun_mpi(nb_kernel410_x86_64_sse+0xa65) [0x947e25]
[node-005:13244] [ 2] mdrun_mpi(do_nonbonded+0x780) [0x8ce890]
[node-005:13244] [ 3] mdrun_mpi(do_force_lowlevel+0x308) [0x6842b8]
[node-005:13244] [ 4] mdrun_mpi(do_force+0xc59) [0x6f7c19]
[node-005:13244] [ 5] mdrun_mpi(do_md+0x5785) [0x626f75]
[node-005:13244] [ 6] mdrun_mpi(mdrunner+0xa07) [0x61e8a7]
[node-005:13244] [ 7] mdrun_mpi(main+0x1363) [0x62c5f3]
[node-005:13244] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2aaaac62d994]
[node-005:13244] [ 9] mdrun_mpi(__gxx_personality_v0+0x479) [0x44b659]
[node-005:13244] *** End of error message ***

Things I have tried:
* Both MPI and non-MPI versions on cluster (same result)
* Harmonising FFTW - configured and compiled fftw3 from same source using same 
configuration and ensured correct library was included during configure step
* Checking the Reproducibility documentation
* Searching the archives - I didn't find anything that described a similar 
problem.

Things I think may be involved:
* Different architectures - i686 vs x86_64 - don't know how to test for this
* Different BLAS/LAPACK libraries - I believe gromacs uses the vecLb on OS X; 
maybe I could compile without external BLAS/LAPACK and see if this makes a 
difference
* Some other unknown problem

I've currently spent more than 2 weeks trying to diagnose this problem and 
don't seem to be making progress. Could anyone suggest what is the most likely 
cause of this significant difference in output, and what I could do to test/fix 
it?

Any help is greatly appreciated.

Luke

-- 
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

Reply via email to