Vaz, Guilherme wrote:
Gus,

Thanks for your email. Some more explanation then:

1) We have made this estimation of memory already in the past. My code takes for n*Mcells => 2.5n*GBRam. So for 1.2MCells we need 3GB Ram. The problem occurs in one PC with 12GB Ram and 4 cores. So 12GB Ram is enough. So far (and in the other systems) if we had problems with memory it "just" starts to swap but did/does not crash.


Hi Guilherme

Now you are speaking.  Much better.
So, you know your problem size, you know how much memory you need,
at least w.r.t. what you allocate directly.

2) The code is my code, so I am sure that with mpiexec or without mpiexec the code is the same and that I don't use OpenMP directly in the code.

I am a bit surprised that the same code runs with and without mpiexec.
Do you mean the same executable?
Or are they different executables, one
of which you perhaps compile with pre-processor directives to get around
the MPI calls and make it sequential?

As for OpenMP it still remains the possibility that the libraries you
call use threads (with or without OpenMP).

But, we also use Intel MKL libraries together with PETSC linear-system solvers. I know that MKL tries to start several threads for each MPI process (yes process not processor). We disable it by setting MKL_NUM_THREADS=1 (otherwise we see immediately in the task manager the several threads starting).


I would catch all the return codes from PETSc calls, print them out if
in error, and call MPI_Abort, if this is not yet in your code, and keep there at least while you sort out where the problem is.
If using MKL directly, not via PETSC, do the same with the MKL calls.

3) All the runs are done in a 64bits Intel machine with 4 cores and 12GB Ram. We don't set any affinity or similar stuff.


I am suprised that it runs with -np 32 on only 4 physical cores,
which is a lot of oversubscription.
I wonder if this reduces walltime.

4) I could always start more MPI processes than cores, as long the memory was enough. And the memory is enough, otherwise how can the same problem with 2,4,8,16 MPI processes
not work and with 32 work. So that is why I thought on stack memory problem.

5) I will see what gdb says about a core-dump tomorrow.

Gus, is this more clear?

Yes.

Do you have any tip now?

No.

Old tip again:
Did you monitor memory use with top while the job is running?
"top -H" shows you all threads.

Don't you think this a stack-memory problem, which btw is already ulimit -s unlimited?


That certainly helps for number crunching,
although it may not solve your specific problem.

Gus Correa

Thanks guys.

Guilherme

-----Original Message-----
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Gus Correa
Sent: Thursday, December 16, 2010 5:55 PM
To: Open MPI Users
Subject: Re: [OMPI users] segmentation fault

Vaz, Guilherme wrote:
Ok, ok. It is indeed a CFD program, and Gus got it right. Number of cells per 
core means memory per core (sorry for the inaccuracy).
My PC has 12GB of RAM.

Can you do one of those typical engineering calculations, a back of the
envelope estimate of how much memory your program needs for a certain
problem size?
This is the first thing to do.
It should tell you whether 12GB is good enough or not.
How many cells, how much memory each cell or array or structure takes,
etc ...

And the same calculation runs fine in an old Ubuntu8.04 32bits with 4GB RAM.
What I find strange is that the same problems runs with 1 core (without evoking 
mpiexec)

This one is likely to be a totally different version of the code,
either serial or threaded (perhaps with OpenMP, NOT OpenMPI).

and then for large number of cores/processes, for instance mpiexec -n 32.
 > Something in between not.

You didn't explain.
Are all the runs (1 processor, 4 processors, 32 processors)
in  a single machine, or in a cluster?
How many computers are used on each run?
How much memory does each machine have?
Any error messages?

It makes a difference to understand what is going on.
You may saturate memory in a single machine (your 4-processor run),
but not on, say, four machines (if this is what you mean when you
say it runs on 32 processors).

Please, clarify.
With the current problem description, a solution may not exist,
or there may be multiple solutions for multiple and
yet not described issues, or the solution may have nothing to do
with the description you provided or with MPI.
A mathematician would call this an "ill posed problem",
a la Haddamard. :)
But that is how debugging parallel programs go.

And it is not a bug in the program because it runs in other machines
and the code has not been changed.


That is no guarantee against bugs.
They can creep in depending on the computer environment,
how many computers you are using, the number of processors,
on any data or parameter that you change,
on a bunch of different things.

Anymore hints?


Did you try the ones I sent before, regarding stack size,
and monitoring memory via "top)?
What did you get?



Gus

Thanks in advance.

Guilherme




dr. ir. Guilherme Vaz
CFD Researcher
Research & Development
E mailto:g....@marin.nl
T +31 317 49 33 25

MARIN
2, Haagsteeg, P.O. Box 28, 6700 AA Wageningen, The Netherlands
T +31 317 49 39 11, F +31 317 49 32 45, I www.marin.nl

-----Original Message-----
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Gus Correa
Sent: Thursday, December 16, 2010 12:46 AM
To: Open MPI Users
Subject: Re: [OMPI users] segmentation fault

Maybe a CFD jargon?
Perhaps the number (not size) of cells in a mesh/grid being handled
by each core/cpu?

Ralph Castain wrote:
I have no idea what you mean by "cell sizes per core". Certainly not any
terminology within OMPI...


On Dec 15, 2010, at 3:47 PM, Vaz, Guilherme wrote:

Dear all,

I have a problem with openmpi1.3, ifort+mkl v11.1 in Ubuntu10.04
systems (32 or 64bit). My code worked in Ubuntu8.04 and works in
RedHat based systems, with slightly different version changes on mkl
and ifort. There were no changes in the source code.
The problem is that the application works for small cell sizes per
core, but not for large cell sizes per core. And it always works for 1
core.
Example: a grid with 1.2Million cells does not work with mpiexec -n 4
<my_app> but it works with mpiexec -n 32 <my_app>. It seems that there
is a maximum of cell/core. And it works with <my_app>.

Is this a stack size (or any memory problem)? Should I set the ulimit
-s unlimited not only on my bashrc but also in the ssh environment
(and how)? Or is something else?
Any clues/tips?

Thanks for any help.

Gui




<imagec393d1.JPG><image4c4685.JPG>

dr. ir. Guilherme Vaz

CFD Researcher


Research & Development





*MARIN*





     2, Haagsteeg
E g....@marin.nl <mailto:g....@marin.nl>     P.O. Box 28     T +31 317 49 39 11
     6700 AA Wageningen      F +31 317 49 32 45
T  +31 317 49 33 25  The Netherlands I  www.marin.nl <http://www.marin.nl>


_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users
------------------------------------------------------------------------

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to