[OMPI users] no ikrit component of in oshmem
Hi! I am trying to build openmpi 1.8 with Open SHMEM and Mellanox MXM. But oshmem_info does not display methe information about ikrit in spml. ... MCA scoll: mpi (MCA v2.0, API v1.0, Component v1.8) MCA spml: yoda (MCA v2.0, API v2.0, Component v1.8) MCA sshmem: mmap (MCA v2.0, API v2.0, Component v1.8) ... С уважением, tismagi...@mail.ru ompi-output.tar.bz2 Description: Binary data
Re: [OMPI users] no ikrit component of in oshmem
Hi Timur, What "configure" line you used? ikrit could be compile-it if no "--with-mxm=/opt/mellanox/mxm" was provided. Can you please attach your config.log? Thanks On Wed, Apr 23, 2014 at 3:10 PM, Тимур Исмагилов wrote: > Hi! > I am trying to build openmpi 1.8 with Open SHMEM and Mellanox MXM. > But oshmem_info does not display methe information about ikrit in spml. > > ... > MCA scoll: mpi (MCA v2.0, API v1.0, Component v1.8) > MCA spml: yoda (MCA v2.0, API v2.0, Component v1.8) > MCA sshmem: mmap (MCA v2.0, API v2.0, Component v1.8) > ... > > С уважением, > > tismagi...@mail.ru > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
[OMPI users] IMB Sendrecv hangs with OpenMPI 1.6.5 and XRC
I am running IMB (Intel MPI Benchmarks), the MPI-1 benchmarks, which was built with Intel 12.1 compiler suite and OpenMPI 1.6.5 (and running w/ OMPI 1.6.5). I decided to use the following for the mca parameters: --mca btl openib,tcp,self --mca btl_openib_receive_queues X,9216,256,128,32:X,65536,256,128,32 where before, I always used "--mca btl openib,tcp,self". This is for performance analysis. On the SendRecv benchmark at 32 processes, IMB hangs. I then tried: --mca btl_openib_receive_queues X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32 and IMB also hangs on the SendRecv benchmark, though at 64 processes. No errors have been recorded, not even in any system log files but 'top' shows the MPI tasks running. How can I go about troubleshooting this hang, as well as figuring out what (If any) MCA XRC-related parameters in btl_openib_receive_queues I have to specify to get IMB running properly? I did verify the IB cards are ConnectX. --john
Re: [OMPI users] IMB Sendrecv hangs with OpenMPI 1.6.5 and XRC
A few suggestions: - Try using Open MPI 1.8.1. It's the newest release, and has many improvements since the 1.6.x series. - Try using "--mca btl openib,sm,self" (in both v1.6.x and v1.8.x). This allows Open MPI to use shared memory to communicate between processes on the same server, which can be a significant performance improvement over TCP or even IB. On Apr 23, 2014, at 11:10 AM, "Sasso, John (GE Power & Water, Non-GE)" wrote: > I am running IMB (Intel MPI Benchmarks), the MPI-1 benchmarks, which was > built with Intel 12.1 compiler suite and OpenMPI 1.6.5 (and running w/ OMPI > 1.6.5). I decided to use the following for the mca parameters: > > --mca btl openib,tcp,self --mca btl_openib_receive_queues > X,9216,256,128,32:X,65536,256,128,32 > > where before, I always used “--mca btl openib,tcp,self”. This is for > performance analysis. On the SendRecv benchmark at 32 processes, IMB hangs. > I then tried: > > --mca btl_openib_receive_queues > X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32 > > and IMB also hangs on the SendRecv benchmark, though at 64 processes. > > No errors have been recorded, not even in any system log files but ‘top’ > shows the MPI tasks running. How can I go about troubleshooting this hang, > as well as figuring out what (If any) MCA XRC-related parameters in > btl_openib_receive_queues I have to specify to get IMB running properly? I > did verify the IB cards are ConnectX. > > --john > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] IMB Sendrecv hangs with OpenMPI 1.6.5 and XRC
Thank-you Jeff. I re-ran IMB (a 64-core run, distributed across a number of nodes) under different mca parameters. Here are the results using OpenMPI 1.6.5: 1. --mca btl openib,sm,self --mca btl_openib_receive_queues X,9216,256,128,32:X,65536,256,128,32 IMB did not hang. Consumed 9263 sec (aggregate) CPU time and 8986 MB memory 2. --mca btl openib,sm,self --mca btl_openib_receive_queues X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32 IMB hung on Bcast benchmark on a 64-process run, with message size of 64 bytes 3. --mca btl openib,sm,self IMB did not hang. Consumed 9360 sec (aggregate) CPU time and 9360 MB memory 4. --mca btl openib,tcp,self IMB did not hang. Consumed 41911 sec (aggregate) CPU time and 9239 MB memory I did not try OpenMPI 1.8.1 since I am restricted to 1.6.5 at this time, but I'm doing a build of 1.8.1 now to test out. BTW, the release notes refer to 1.8.2 but the site only has 1.8.1 available for download. I am a bit concerned, however, with my prior runs hanging. First, I was unable to discern why IMB was hanging so any advice/guidance would be greatly appreciated. I tried doing an strace on an MPI process but no helpful info. Second, the motivation behind using XRC was to cut down on memory demands w.r.t. the RC QPs. I'd like to get this working, unless someone can elaborate on the negative aspects of using XRC instead of RC QPs. Thanks! --john -Original Message- From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres (jsquyres) Sent: Wednesday, April 23, 2014 11:19 AM To: Open MPI Users Subject: Re: [OMPI users] IMB Sendrecv hangs with OpenMPI 1.6.5 and XRC A few suggestions: - Try using Open MPI 1.8.1. It's the newest release, and has many improvements since the 1.6.x series. - Try using "--mca btl openib,sm,self" (in both v1.6.x and v1.8.x). This allows Open MPI to use shared memory to communicate between processes on the same server, which can be a significant performance improvement over TCP or even IB. On Apr 23, 2014, at 11:10 AM, "Sasso, John (GE Power & Water, Non-GE)" wrote: > I am running IMB (Intel MPI Benchmarks), the MPI-1 benchmarks, which was > built with Intel 12.1 compiler suite and OpenMPI 1.6.5 (and running w/ OMPI > 1.6.5). I decided to use the following for the mca parameters: > > --mca btl openib,tcp,self --mca btl_openib_receive_queues > X,9216,256,128,32:X,65536,256,128,32 > > where before, I always used "--mca btl openib,tcp,self". This is for > performance analysis. On the SendRecv benchmark at 32 processes, IMB hangs. > I then tried: > > --mca btl_openib_receive_queues > X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32 > > and IMB also hangs on the SendRecv benchmark, though at 64 processes. > > No errors have been recorded, not even in any system log files but 'top' > shows the MPI tasks running. How can I go about troubleshooting this hang, > as well as figuring out what (If any) MCA XRC-related parameters in > btl_openib_receive_queues I have to specify to get IMB running properly? I > did verify the IB cards are ConnectX. > > --john > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] mpi.isend still not working (was trying to use personal copy of 1.7.4--solved)
On Wed, 2014-04-23 at 13:05 -0400, Hao Yu wrote: > Hi Ross, > > Sorry for backing to you later on this issue. After finishing my course, I > am working on Rmpi 0.6-4 to be released soon to CRAN. > > I did a few tests like yours and indeed I was able to produce some > deadlocks whenever mpi.isend.Robj is used. Later on I traced it to some > kind of race condition. If you use mpi.test to test whether mpi.isend.Robj > finishes its job or not, this deadlock may be avoided. I did like > mpi.isend.Robj(r,1,4,request=0) > mpi.test(0) > > If mpi.test(0) returns FALSE and I run > mpi.send.Robj(r2,1,4) > > then I get no prompt. If mpi.test(0) returns TRUE, then > mpi.send.Robj(r2,1,4) > > is OK. So, if any nonblocking calls are used, one must use mpi.test or > mpi.wait to check if they are complete before trying any blocking calls. That sounds like a different problem than the one I encountered. The system did get hung up, but the reason was that processes received corrupted R objects, threw an error, and stopped responding. The root of my problem was that objects got garbage collected before the isend completed. This will happen regardless of subsequent R-level calls (e.g., to mpi.test). The object to be transmitted is serialized and passed to C, but when the call returns there are no R references to the object--that is, the serialized version of the object--and so it is subject to garbage collection. I'd be happy to provide my modifications to get around this. Although they worked for me, they are not really suitable for general use. There are 2 main issues: first, I ignored the asynchronous receive since I didn't use it. Since MPI request objects are used for both sending and receiving, I suspect that mixing irecv's in with code doing isends would not work right. I don't think there's any reason in principle the handling of isend's could be extended to include irecv's; I just didn't do it. I also did not put the hooks for the new stuff in calls the reset the maximum number of requests. The second issue is that my fix changed the interface to a slightly higher level of abstraction. Request objects and numbers are more things that are managed by Rmpi than the user. Rmpi keeps references to the serialized objects around as long as the request is outstanding. For example, the revised mpi.isend does not take a request number; the function works out one and returns it. And in general the calls do more than simply call the corresponding C function. Ross Boylan > > Hao > > > Ross Boylan wrote: > > I changed the calls to dlopen in Rmpi.c so that it tried libmpi.so > > before libmpi.so.0. I also rebuilt MPI, R, and Rmpi as suggested > > earlier by Bennet Fauber > > (http://www.open-mpi.org/community/lists/users/2014/03/23823.php). > > Thanks Bennet! > > > > My theory is that the change to dlopen by itself was sufficient. The > > rebuilding done before (by others) may have worked because they made the > > load of libmpi.so.0 fail. That's not a great theory since a) if there > > was no libmpi.so.0 on the system it would fail anyway and b) dlopen > > could probably find libmpi.so.0 in standard system locations regardless > > of how R was built or LD_LIBRARY_PATHS setup (assuming it didn't find it > > in a custom place first). > > > > Which brings me back to my original problem: mpi.isend.Robj (or possibly > > mpi.recv.Robj on the other end) did not seem to be working properly. I > > had hoped switching to a newer MPI library (1.7.4) would fix this; if > > anything, it made it worse. I am sending to a fake receiver (at rank 1) > > that does nothing but print a message when it gets a message. r is a > > list with > >> length(serialize(r, NULL)) # the mpi.isend.Robj R function serializes > > the object and then mpi.isend's it. > > length(serialize(r, NULL)) > > [1] 599499 # ~ 0.5 MB > >> mpi.send.Robj(1, 1, 4) # send of number works > > Fake Assembler: 0 4 numeric > >> mpi.send.Robj(r, 1, 4) # send of r works > > NULL > >> Fake Assembler: 0 4 list > > mpi.isend.Robj(1, 1, 4) # isend of number works > >> Fake Assembler: 0 4 numeric > > mpi.isend.Robj(r, 1, 4) # sometimes this used to work the first time > >> mpi.send.Robj(r, 1, 4) # sometimes used to get previous message > > unstuck > > # never get the command prompt back > > # presumably mpi.send, the C function, does not return. > > > > I might just switch to mpi.send, though the fact that something is going > > wrong makes me nervous. > > > > Obviously given the involvement of R it's not clear the problem lies > > with the MPI layer, but that seems at least a possibility. > > > > Ross > > On Thu, 2014-03-13 at 12:15 -0700, Ross Boylan wrote: > >> On Wed, 2014-03-12 at 10:52 -0400, Bennet Fauber wrote: > >> > My experience with Rmpi and OpenMPI is that it doesn't seem to do well > >> > with the dlopen or dynamic loading. I recently installed R 3.0.3, and > >> > Rmpi, which failed when built against our standard OpenMPI but > >> > succeeded us