Re: [OMPI users] Gamess with openmpi

Jeff Squyres Thu, 5 Mar 2009 17:27:02 -0500

Is gamess calling fork(), perchance? Perhaps through a system() orpopen() call?


On Mar 5, 2009, at 3:50 AM, Thomas Exner wrote:

Dear Jeff:

Thank you very much for your reply. Unfortunately, the overloading is
not the problem. The phenomenon also appears if we use only two
processes on the 8core machines. When I run the jobs over two nodes,one
 is doing nothing anymore after a couple of minutes. The strange thing
is that this only happens on ifiniband and only with mpi2 libraries
(openmpi and mvapich2). Mvapich1 is running reasonably fine at the
moment. Perhaps the first to mpi implementations have something in
common, which could trigger the problems.

Thanks again.
Thomas

Jeff Squyres wrote:
> Sorry for the delay in replying -- INBOX deluge makes me missemails on
> the users list sometimes.
>
> I'm unfortunately not familiar with gamess -- have you checked with
> their support lists or documentation?
>
> Note that Open MPI's IB progression engine will spin hard to make
> progress for message passing. Specifically, if you have processesthat> are "blocking" in message passing calls, those processes willactually
> be spinning trying to make progress (vs. actually blocking in the
> kernel).  So if you overload your hosts -- meaning that you run more
> Open MPI jobs than there are cores -- you could well experiencedramatic> slowdown in overall performance because every MPI job will becompeting
> for CPU cycles.
>
>
> On Feb 24, 2009, at 4:57 AM, Thomas Exner wrote:
>
>> Dear all:
>>
>> Because I am new to this list, I would like to introduce myself as
>> Thomas Exner and please excuse silly questions, because I am only a
>> chemist.
>>
>> And now my problem, with which I am fiddling around for almost aweek: I
>> try to use gamess with openmpi on infiniband. There is a good
>> description on how to compile it with mpi and it can be done,even if
>> it is not easy. But then on run time everything gets weird. The
>> specialty of gamess is that it runs twice as much mpi jobs thanused for>> the computation. The second half is used as data server,requiring data
>> but with very little cpu load. Each one of these data servers is
>> connected to a specific compute job. Therefore, these twocorresponding>> jobs have to be run on the same node. On one node everything isfine>> (2x4core machines in my case), because all the jobs areguarantied to>> run on this node. If I try two nodes, at the beginning alsoeverything>> is fine. 8 compute jobs and 8 data server are running on eachmachine.>> But after a short while, the entire set of processes (16) on thefirst>> node start to accumulate CPU time, with nothing usefulhappening. The>> second node's processes go entirely to sleep. Is it possible thatall>> the compute jobs are for some reason been transfered to the firstnode?>> This would explain the load of 16 on the first and 0 on thesecond node,>> because 16 compute jobs (100 % cpu load) and 16 data servers(almost 0%>> load) are running, respectively. Strange thing is also that thesame
>> version runs on gigabit and myrinet fine.
>>
>> It would be great if somebody could help me on that. If you needmore
>> information, I will be happy to share them with you.
>>
>> Thanks very much.
>> Thomas
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems

Re: [OMPI users] Gamess with openmpi

Reply via email to