Does OMPI have shared memory capabilities (as it is mentioned in MPI-2)? How can I use them?
Andrei On Sat, Sep 25, 2010 at 23:19, Andrei Fokau <andrei.fo...@neutron.kth.se>wrote: > Here are some more details about our problem. We use a dozen of 4-processor > nodes with 8 GB memory on each node. The code we run needs about 3 GB per > processor, so we can load only 2 processors out of 4. The vast majority of > those 3 GB is the same for each processor and is accessed continuously > during calculation. In my original question I wasn't very clear asking about > a possibility to use shared memory with Open MPI - in our case we do not > need to have a remote access to the data, and it would be sufficient to > share memory within each node only. > > Of course, the possibility to access the data remotely (via mmap) is > attractive because it would allow to store much larger arrays (up to 10 GB) > at one remote place, meaning higher accuracy for our calculations. However, > I believe that the access time would be too long for the data read so > frequently, and therefore the performance would be lost. > > I still hope that some of the subscribers to this mailing list have an > experience of using Global Arrays. This library seems to be fine for our > case, however I feel that there should be a simpler solution. Open MPI > conforms with MPI-2 standard, and the later has a description of shared > memory application. Do you see any other way for us to use shared memory > (within node) apart of using Global Arrays? > > Andrei > > > On Fri, Sep 24, 2010 at 19:03, Durga Choudhury <dpcho...@gmail.com> wrote: > >> I think the 'middle ground' approach can be simplified even further if >> the data file is in a shared device (e.g. NFS/Samba mount) that can be >> mounted at the same location of the file system tree on all nodes. I >> have never tried it, though and mmap()'ing a non-POSIX compliant file >> system such as Samba might have issues I am unaware of. >> >> However, I do not see why you should not be able to do this even if >> the file is being written to as long as you call msync() before using >> the mapped pages. >> >> Durga >> >> >> On Fri, Sep 24, 2010 at 12:31 PM, Eugene Loh <eugene....@oracle.com> >> wrote: >> > It seems to me there are two extremes. >> > >> > One is that you replicate the data for each process. This has the >> > disadvantage of consuming lots of memory "unnecessarily." >> > >> > Another extreme is that shared data is distributed over all processes. >> This >> > has the disadvantage of making at least some of the data less >> accessible, >> > whether in programming complexity and/or run-time performance. >> > >> > I'm not familiar with Global Arrays. I was somewhat familiar with HPF. >> I >> > think the natural thing to do with those programming models is to >> distribute >> > data over all processes, which may relieve the excessive memory >> consumption >> > you're trying to address but which may also just put you at a different >> > "extreme" of this spectrum. >> > >> > The middle ground I think might make most sense would be to share data >> only >> > within a node, but to replicate the data for each node. There are >> probably >> > multiple ways of doing this -- possibly even GA, I don't know. One way >> > might be to use one MPI process per node, with OMP multithreading within >> > each process|node. Or (and I thought this was the solution you were >> looking >> > for), have some idea which processes are collocal. Have one process per >> > node create and initialize some shared memory -- mmap, perhaps, or SysV >> > shared memory. Then, have its peers map the same shared memory into >> their >> > address spaces. >> > >> > You asked what source code changes would be required. It depends. If >> > you're going to mmap shared memory in on each node, you need to know >> which >> > processes are collocal. If you're willing to constrain how processes >> are >> > mapped to nodes, this could be easy. (E.g., "every 4 processes are >> > collocal".) If you want to discover dynamically at run time which are >> > collocal, it would be harder. The mmap stuff could be in a stand-alone >> > function of about a dozen lines. If the shared area is allocated as one >> > piece, substituting the single malloc() call with a call to your mmap >> > function should be simple. If you have many malloc()s you're trying to >> > replace, it's harder. >> > >> > Andrei Fokau wrote: >> > >> > The data are read from a file and processed before calculations begin, >> so I >> > think that mapping will not work in our case. >> > Global Arrays look promising indeed. As I said, we need to put just a >> part >> > of data to the shared section. John, do you (or may be other users) have >> an >> > experience of working with GA? >> > http://www.emsl.pnl.gov/docs/global/um/build.html >> > When GA runs with MPI: >> > MPI_Init(..) ! start MPI >> > GA_Initialize() ! start global arrays >> > MA_Init(..) ! start memory allocator >> > .... do work >> > GA_Terminate() ! tidy up global arrays >> > MPI_Finalize() ! tidy up MPI >> > ! exit program >> > On Fri, Sep 24, 2010 at 13:44, Reuti <re...@staff.uni-marburg.de> >> wrote: >> >> >> >> Am 24.09.2010 um 13:26 schrieb John Hearns: >> >> >> >> > On 24 September 2010 08:46, Andrei Fokau < >> andrei.fo...@neutron.kth.se> >> >> > wrote: >> >> >> We use a C-program which consumes a lot of memory per process (up to >> >> >> few >> >> >> GB), 99% of the data being the same for each process. So for us it >> >> >> would be >> >> >> quite reasonable to put that part of data in a shared memory. >> >> > >> >> > http://www.emsl.pnl.gov/docs/global/ >> >> > >> >> > Is this eny help? Apologies if I'm talking through my hat. >> >> >> >> I was also thinking of this when I read "data in a shared memory" >> (besides >> >> approaches like http://www.kerrighed.org/wiki/index.php/Main_Page). >> Wasn't >> >> this also one idea behind "High Performance Fortran" - running in >> parallel >> >> across nodes even without knowing that it's across nodes at all while >> >> programming and access all data like it's being local. >> > >> >>