I do not know about your code but:
1) did you check stack limitations ? Typically intel fortran codes needs
large amount of stack when the problem size increase.
Check ulimit -a
2) did your node uses cpuset and memory limitation like fake numa to set
the maximum amount of memory available for a job ?
Patrick
Duke Nguyen a écrit :
Hi folks,
I am sorry if this question had been asked before, but after ten days
of searching/working on the system, I surrender :(. We try to use
mpirun to run abinit (abinit.org) which in turns will call an input
file to run some simulation. The command to run is pretty simple
$ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log
We ran this command on a server with two quad core x5420 and 16GB of
memory. I called only 4 core, and I guess in theory each of the core
should take up to 2GB each.
In the output of the log, there is something about memory:
P This job should need less than 717.175 Mbytes of
memory.
Rough estimation (10% accuracy) of disk space for files :
WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 Mbytes.
So basically it reported that the above job should not take more than
718MB each core.
But I still have the Segmentation Fault error:
mpirun noticed that process rank 0 with PID 16099 on node biobos
exited on signal 11 (Segmentation fault).
The system already has limits up to unlimited:
$ cat /etc/security/limits.conf | grep -v '#'
* soft memlock unlimited
* hard memlock unlimited
I also tried to run
$ ulimit -l unlimited
before the mpirun command above, but it did not help at all.
If we adjust the parameters of the input.files to give the reported
mem per core is less than 512MB, then the job runs fine.
Please help,
Thanks,
D.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users