Vaz, Guilherme wrote:
Dear all,

I have a problem with openmpi1.3, ifort+mkl v11.1 in Ubuntu10.04 systems (32 or 64bit). My code worked in Ubuntu8.04 and works in RedHat based systems, with slightly different version changes on mkl and ifort. There were no changes in the source code. The problem is that the application works for small cell sizes per core, but not for large cell sizes per core. And it always works for 1 core. Example: a grid with 1.2Million cells does not work with mpiexec -n 4 <my_app> but it works with mpiexec -n 32 <my_app>. It seems that there is a maximum of cell/core. And it works with <my_app>.

Is this a stack size (or any memory problem)? Should I set the ulimit -s unlimited not only on my bashrc but also in the ssh environment (and how)? Or is something else?
Any clues/tips?

Thanks for any help.

Gui
dr. ir. Guilherme Vaz
CFD Researcher
Research & Development      
*MARIN*
        2, Haagsteeg    
E g....@marin.nl <mailto:g....@marin.nl>  P.O. Box 28     T +31 317 49 39 11
        6700 AA Wageningen      F +31 317 49 32 45
T  +31 317 49 33 25     The Netherlands         I  www.marin.nl 
<http://www.marin.nl>
------------------------------------------------------------------------

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Hi Guilherme

Can you estimate how much memory each run configuration requires,
and if the problem fits your computer's RAM?
(with some slack for OS, MPI, etc)
To check directly your guess of getting out of memory,
and if the program starts swapping,
login to the compute node or nodes and use "top".

Hard to tell the cause of segfault with this information only.
It could come from a limited stack, it could be from small RAM when you
run in one computer only, it could be a bug in the code.

On RedHat/Fedora/CentOs
you can set the stack to unlimited on /etc/security/limits.conf,
maybe the same in Ubuntu.
'man limits.conf' may help.

My two cents,
Gus Correa

Reply via email to