Re: [OMPI users] memory leaks on solaris

2007-09-03 Thread Glenn Carver
Hi Don, Somehow I thought it might not be so easy.. otherwise it might have been spotted before! Although we first spotted the problem with our own application, I did the most recent tests using the Intel MPI Benchmarks (intel_clustertools3.tar.gz) and saw the same behaviour. It might be i

Re: [OMPI users] memory leaks on solaris

2007-08-10 Thread Don Kerr
Glenn, This will require some more investigation. I have verified that the udapl btl is making the proper calls to free registered memory and though I have seen the free memory as listed by vmstat drop and I see it come back as well. Additionally if I run a basic bandwidth test serially(one

Re: [OMPI users] memory leaks on solaris

2007-08-07 Thread Glenn Carver
Don, Following up on this, here are the results of the tests. All is well until udapl is included. In addition there are no mca parameters set in these jobs. As I reported to you before, if I add --mca btl_udapl_flags=1, the memory problem goes away. The batch jobs run vmstat before and aft

Re: [OMPI users] memory leaks on solaris

2007-08-07 Thread Don Kerr
Glenn, While I look into the possibility of registered memory not being freed could you run your same tests but without shared memory or udapl. "--mca btl self,tcp" If this is successful, i.e. frees memory as expected. The next step would be to run including shared memory, "--mca btl self,s

Re: [OMPI users] memory leaks on solaris

2007-08-07 Thread Don Kerr
I will run some tests to check out this possibility. -DON Jeff Squyres wrote: I guess this is a question for Sun: what happens if registered memory is not freed after a process exits? Does the kernel leave it allocated? On Aug 6, 2007, at 7:00 PM, Glenn Carver wrote: Just to clarify,

Re: [OMPI users] memory leaks on solaris

2007-08-07 Thread Jeff Squyres
I guess this is a question for Sun: what happens if registered memory is not freed after a process exits? Does the kernel leave it allocated? On Aug 6, 2007, at 7:00 PM, Glenn Carver wrote: Just to clarify, the MPI applications exit cleanly. We have our own f90 code (in various configuratio

Re: [OMPI users] memory leaks on solaris

2007-08-06 Thread Glenn Carver
Just to clarify, the MPI applications exit cleanly. We have our own f90 code (in various configurations) and I'm also testing using Intel's IMB. If I watch the applications whilst they run, there is a drop in free memory as our code begins, the free memory then steadily drops as the code runs.

Re: [OMPI users] memory leaks on solaris

2007-08-06 Thread Ralph Castain
Guess I don't see how stale shared memory files would cause swapping to occur. Besides, the user provided no indication that the applications were abnormally terminating, which makes it likely we cleaned up the session directories as we should. However, we definitely leak memory (i.e., we don't fr

Re: [OMPI users] memory leaks on solaris

2007-08-06 Thread Jeff Squyres
Unless there's something weird going on in the Solaris kernel, the only memory that we should be leaking after MPI processes exit would be shared memory files that are [somehow] not getting removed properly. Right? On Aug 6, 2007, at 8:15 AM, Ralph H Castain wrote: Hmmm...just to clarify

Re: [OMPI users] memory leaks on solaris

2007-08-06 Thread Ralph H Castain
Hmmm...just to clarify as I think there may be some confusion here. Orte-clean will kill any outstanding Open MPI daemons (which should kill their local apps) and will cleanup their associated temporary file systems. If you are having problems with zombied processes or stale daemons, then this wil

Re: [OMPI users] memory leaks on solaris

2007-08-06 Thread Don Kerr
Glenn, With CT7 there is a utility which can be used to clean up left over cruft from stale MPI processes. % man -M /opt/SUNWhpc/man -s 1 orte-clean Achtung: This will remove current running jobs as well. Use of "-v" for verbose recommended. I would be curious if this helps. -DON p.s. or

Re: [OMPI users] memory leaks on solaris

2007-08-06 Thread Ralph H Castain
On 8/5/07 6:35 PM, "Glenn Carver" wrote: > I'd appreciate some advice and help on this one. We're having > serious problems running parallel applications on our cluster. After > each batch job finishes, we lose a certain amount of available > memory. Additional jobs cause free memory to grad

[OMPI users] memory leaks on solaris

2007-08-05 Thread Glenn Carver
I'd appreciate some advice and help on this one. We're having serious problems running parallel applications on our cluster. After each batch job finishes, we lose a certain amount of available memory. Additional jobs cause free memory to gradually go down until the machine starts swapping an