[OMPI users] no ikrit component of in oshmem

2014-04-23 Thread Тимур Исмагилов
 Hi!
I am trying to build openmpi 1.8 with Open SHMEM and Mellanox MXM.
But oshmem_info does not display methe information about ikrit in spml.
...
MCA scoll: mpi (MCA v2.0, API v1.0, Component v1.8)
MCA spml: yoda (MCA v2.0, API v2.0, Component v1.8)
MCA sshmem: mmap (MCA v2.0, API v2.0, Component v1.8)
...
С уважением,

tismagi...@mail.ru

ompi-output.tar.bz2
Description: Binary data


Re: [OMPI users] no ikrit component of in oshmem

2014-04-23 Thread Mike Dubman
Hi Timur,

What "configure" line you used? ikrit could be compile-it if no
"--with-mxm=/opt/mellanox/mxm" was provided.
Can you please attach your config.log?

Thanks



On Wed, Apr 23, 2014 at 3:10 PM, Тимур Исмагилов  wrote:

> Hi!
> I am trying to build openmpi 1.8 with Open SHMEM and Mellanox MXM.
> But oshmem_info does not display methe information about ikrit in spml.
>
> ...
> MCA scoll: mpi (MCA v2.0, API v1.0, Component v1.8)
> MCA spml: yoda (MCA v2.0, API v2.0, Component v1.8)
> MCA sshmem: mmap (MCA v2.0, API v2.0, Component v1.8)
> ...
>
> С уважением,
>
> tismagi...@mail.ru
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] IMB Sendrecv hangs with OpenMPI 1.6.5 and XRC

2014-04-23 Thread Sasso, John (GE Power & Water, Non-GE)
I am running IMB (Intel MPI Benchmarks), the MPI-1 benchmarks, which was built 
with Intel 12.1 compiler suite and OpenMPI 1.6.5 (and running w/ OMPI 1.6.5).  
I decided to use the following for the mca parameters:

--mca btl openib,tcp,self --mca btl_openib_receive_queues 
X,9216,256,128,32:X,65536,256,128,32

where before, I always used "--mca btl openib,tcp,self".  This is for 
performance analysis.  On the SendRecv benchmark at 32 processes, IMB hangs.  I 
then tried:

--mca btl_openib_receive_queues 
X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32

and IMB also hangs on the SendRecv benchmark, though at 64 processes.

No errors have been recorded, not even in any system log files but 'top' shows 
the MPI tasks running.  How can I go about troubleshooting this hang, as well 
as figuring out what (If any) MCA XRC-related parameters in 
btl_openib_receive_queues I have to specify to get IMB running properly?   I 
did verify the IB cards are ConnectX.

--john


Re: [OMPI users] IMB Sendrecv hangs with OpenMPI 1.6.5 and XRC

2014-04-23 Thread Jeff Squyres (jsquyres)
A few suggestions:

- Try using Open MPI 1.8.1.  It's the newest release, and has many improvements 
since the 1.6.x series.

- Try using "--mca btl openib,sm,self" (in both v1.6.x and v1.8.x).  This 
allows Open MPI to use shared memory to communicate between processes on the 
same server, which can be a significant performance improvement over TCP or 
even IB.



On Apr 23, 2014, at 11:10 AM, "Sasso, John (GE Power & Water, Non-GE)" 
 wrote:

> I am running IMB (Intel MPI Benchmarks), the MPI-1 benchmarks, which was 
> built with Intel 12.1 compiler suite and OpenMPI 1.6.5 (and running w/ OMPI 
> 1.6.5).  I decided to use the following for the mca parameters:
>  
> --mca btl openib,tcp,self --mca btl_openib_receive_queues 
> X,9216,256,128,32:X,65536,256,128,32
>  
> where before, I always used “--mca btl openib,tcp,self”.  This is for 
> performance analysis.  On the SendRecv benchmark at 32 processes, IMB hangs.  
> I then tried:
>  
> --mca btl_openib_receive_queues 
> X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32
>  
> and IMB also hangs on the SendRecv benchmark, though at 64 processes.
>  
> No errors have been recorded, not even in any system log files but ‘top’ 
> shows the MPI tasks running.  How can I go about troubleshooting this hang, 
> as well as figuring out what (If any) MCA XRC-related parameters in 
> btl_openib_receive_queues I have to specify to get IMB running properly?   I 
> did verify the IB cards are ConnectX.
>  
> --john
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] IMB Sendrecv hangs with OpenMPI 1.6.5 and XRC

2014-04-23 Thread Sasso, John (GE Power & Water, Non-GE)
Thank-you Jeff.  I re-ran IMB (a 64-core run, distributed across a number of 
nodes) under different mca parameters.  Here are the results using OpenMPI 
1.6.5:

1. --mca btl openib,sm,self --mca btl_openib_receive_queues 
X,9216,256,128,32:X,65536,256,128,32
IMB did not hang.  Consumed 9263 sec (aggregate) CPU time and 8986 MB 
memory

2. --mca btl openib,sm,self --mca btl_openib_receive_queues 
X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32
IMB hung on Bcast benchmark on a 64-process run, with message size of 
64 bytes

3. --mca btl openib,sm,self
IMB did not hang.  Consumed 9360 sec (aggregate) CPU time and 9360 MB 
memory

4. --mca btl openib,tcp,self
IMB did not hang.  Consumed 41911 sec (aggregate) CPU time and 9239 MB 
memory

I did not try OpenMPI 1.8.1 since I am restricted to 1.6.5 at this time, but 
I'm doing a build of 1.8.1 now to test out.  BTW, the release notes refer to 
1.8.2 but the site only has 1.8.1 available for download.

I am a bit concerned, however, with my prior runs hanging.  First, I was unable 
to discern why IMB was hanging so any advice/guidance would be greatly 
appreciated.  I tried doing an strace on an MPI process but no helpful info.   

Second, the motivation behind using XRC was to cut down on memory demands 
w.r.t. the RC QPs.   I'd like to get this working, unless someone can elaborate 
on the negative aspects of using XRC instead of RC QPs.  Thanks!

--john


-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres 
(jsquyres)
Sent: Wednesday, April 23, 2014 11:19 AM
To: Open MPI Users
Subject: Re: [OMPI users] IMB Sendrecv hangs with OpenMPI 1.6.5 and XRC

A few suggestions:

- Try using Open MPI 1.8.1.  It's the newest release, and has many improvements 
since the 1.6.x series.

- Try using "--mca btl openib,sm,self" (in both v1.6.x and v1.8.x).  This 
allows Open MPI to use shared memory to communicate between processes on the 
same server, which can be a significant performance improvement over TCP or 
even IB.



On Apr 23, 2014, at 11:10 AM, "Sasso, John (GE Power & Water, Non-GE)" 
 wrote:

> I am running IMB (Intel MPI Benchmarks), the MPI-1 benchmarks, which was 
> built with Intel 12.1 compiler suite and OpenMPI 1.6.5 (and running w/ OMPI 
> 1.6.5).  I decided to use the following for the mca parameters:
>  
> --mca btl openib,tcp,self --mca btl_openib_receive_queues 
> X,9216,256,128,32:X,65536,256,128,32
>  
> where before, I always used "--mca btl openib,tcp,self".  This is for 
> performance analysis.  On the SendRecv benchmark at 32 processes, IMB hangs.  
> I then tried:
>  
> --mca btl_openib_receive_queues 
> X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32
>  
> and IMB also hangs on the SendRecv benchmark, though at 64 processes.
>  
> No errors have been recorded, not even in any system log files but 'top' 
> shows the MPI tasks running.  How can I go about troubleshooting this hang, 
> as well as figuring out what (If any) MCA XRC-related parameters in 
> btl_openib_receive_queues I have to specify to get IMB running properly?   I 
> did verify the IB cards are ConnectX.
>  
> --john
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] mpi.isend still not working (was trying to use personal copy of 1.7.4--solved)

2014-04-23 Thread Ross Boylan
On Wed, 2014-04-23 at 13:05 -0400, Hao Yu wrote:
> Hi Ross,
> 
> Sorry for backing to you later on this issue. After finishing my course, I
> am working on Rmpi 0.6-4 to be released soon to CRAN.
> 
> I did a few tests like yours and indeed I was able to produce some
> deadlocks whenever mpi.isend.Robj is used. Later on I traced it to some
> kind of race condition. If you use mpi.test to test whether mpi.isend.Robj
> finishes its job or not, this deadlock may be avoided. I did like
> mpi.isend.Robj(r,1,4,request=0)
> mpi.test(0)
> 
> If mpi.test(0) returns FALSE and I run
> mpi.send.Robj(r2,1,4)
> 
> then I get no prompt. If mpi.test(0) returns TRUE, then
> mpi.send.Robj(r2,1,4)
> 
> is OK. So, if any nonblocking calls are used, one must use mpi.test or
> mpi.wait to check if they are complete before trying any blocking calls.
That sounds like a different problem than the one I encountered.  The
system did get hung up, but the reason was that processes received
corrupted R objects, threw an error, and stopped responding.

The root of my problem was that objects got garbage collected before the
isend completed.  This will happen regardless of subsequent R-level
calls (e.g., to mpi.test).  The object to be transmitted is serialized
and passed to C, but when the call returns there are no R references to
the object--that is, the serialized version of the object--and so it is
subject to garbage collection.

I'd be happy to provide my modifications to get around this.  Although
they worked for me, they are not really suitable for general use.  There
are 2 main issues: first, I ignored the asynchronous receive since I
didn't use it.  Since MPI request objects are used for both sending and
receiving, I suspect that mixing irecv's in with code doing isends would
not work right.  I don't think there's any reason in principle the
handling of isend's could be extended to include irecv's; I just didn't
do it.  I also did not put the hooks for the new stuff in calls the
reset the maximum number of requests.

The second issue is that my fix changed the interface to a slightly
higher level of abstraction.  Request objects and numbers are more
things that are managed by Rmpi than the user.  Rmpi keeps references to
the serialized objects around as long as the request is outstanding. For
example, the revised mpi.isend does not take a request number; the
function works out one and returns it.  And in general the calls do more
than simply call the corresponding C function.

Ross Boylan
> 
> Hao
> 
> 
> Ross Boylan wrote:
> > I changed the calls to dlopen in Rmpi.c so that it tried libmpi.so
> > before libmpi.so.0.  I also rebuilt MPI, R, and Rmpi as suggested
> > earlier by Bennet Fauber
> > (http://www.open-mpi.org/community/lists/users/2014/03/23823.php).
> > Thanks Bennet!
> >
> > My theory is that the change to dlopen by itself was sufficient.  The
> > rebuilding done before (by others) may have worked because they made the
> > load of libmpi.so.0 fail.  That's not a great theory since a) if there
> > was no libmpi.so.0 on the system it would fail anyway and b) dlopen
> > could probably find libmpi.so.0 in standard system locations regardless
> > of how R was built or LD_LIBRARY_PATHS setup (assuming it didn't find it
> > in a custom place first).
> >
> > Which brings me back to my original problem: mpi.isend.Robj (or possibly
> > mpi.recv.Robj on the other end) did not seem to be working properly.  I
> > had hoped switching to a newer MPI library (1.7.4) would fix this; if
> > anything, it made it worse.  I am sending to a fake receiver (at rank 1)
> > that does nothing but print a message when it gets a message.  r is a
> > list with
> >> length(serialize(r, NULL))  # the mpi.isend.Robj R function serializes
> > the object and then mpi.isend's it.
> > length(serialize(r, NULL))
> > [1] 599499  # ~ 0.5 MB
> >> mpi.send.Robj(1, 1, 4)  # send of number works
> > Fake Assembler: 0 4 numeric
> >> mpi.send.Robj(r, 1, 4)  # send of r works
> > NULL
> >> Fake Assembler: 0 4 list
> > mpi.isend.Robj(1, 1, 4)  # isend of number works
> >> Fake Assembler: 0 4 numeric
> > mpi.isend.Robj(r, 1, 4)  # sometimes this used to work the first time
> >> mpi.send.Robj(r, 1, 4) # sometimes used to get previous message
> > unstuck
> > # never get the command prompt back
> > # presumably mpi.send, the C function, does not return.
> >
> > I might just switch to mpi.send, though the fact that something is going
> > wrong makes me nervous.
> >
> > Obviously given the involvement of R it's not clear the problem lies
> > with the MPI layer, but that seems at least a possibility.
> >
> > Ross
> > On Thu, 2014-03-13 at 12:15 -0700, Ross Boylan wrote:
> >> On Wed, 2014-03-12 at 10:52 -0400, Bennet Fauber wrote:
> >> > My experience with Rmpi and OpenMPI is that it doesn't seem to do well
> >> > with the dlopen or dynamic loading.  I recently installed R 3.0.3, and
> >> > Rmpi, which failed when built against our standard OpenMPI but
> >> > succeeded us