Re: [OMPI users] coll_ml_priority in openmpi-1.7.5

2014-03-21 Thread tmishima
I could roughly understand what the coll_ml is and how you are going to treat it, thanks. As Ralph pointed out, I didn't see coll_ml was really used. I just thought the slowdown meant it was used. I'll check it later. It might be due to the expensive connectivity computation. Tetsuya > One of

Re: [OMPI users] coll_ml_priority in openmpi-1.7.5

2014-03-21 Thread Jeff Squyres (jsquyres)
One of the authors of ML mentioned to me off-list that he has an idea what might have been causing the slowdown. They're actively working on tweaking and making things better. I told them to ping you -- the whole point is that ml is supposed to be *better* than our existing collectives, so if

Re: [OMPI users] Segmentation Fault

2014-03-21 Thread Jeff Squyres (jsquyres)
On Mar 21, 2014, at 3:26 AM, madhurima madhunapanthula wrote: > Iam trying to link the jumpshot libraries with the graph500 (mpi_tuned_2d > sources). > After linkin the libraries and executing mpirun with the > graph500_mpi_custome_n binaries Iam getting the following segmenation fault. Are y

Re: [OMPI users] testsome returns negative indices [diagnosis]

2014-03-21 Thread Jeff Squyres (jsquyres)
On Mar 21, 2014, at 4:13 PM, Ross Boylan wrote: > There was a problem in the R code that cause MPI_Request objects to be reused > before the original request completed. > Things are working much better now, though some bugs remain (not necessarily > related to MPI_Isend or Testsome). > > Just

Re: [OMPI users] Call stack upon MPI routine error

2014-03-21 Thread Joshua Ladd
Hi, Vince Couple of ideas off the top of my head: 1. Try disabling eager RDMA. Eager RDMA can consume significant resources: "-mca btl_openib_use_eager_rdma 0" 2. Try using the TCP BTL - is the error still present? 3. Try the poor man's debugger - print the pid and hostname of the process wh

Re: [OMPI users] testsome returns negative indices [diagnosis]

2014-03-21 Thread Ross Boylan
On 3/21/2014 10:17 AM, Ross Boylan wrote: On 3/21/2014 10:02 AM, Jeff Squyres (jsquyres) wrote: So just to be clear, the C interface for MPI_Testsome is: int MPI_Testsome(int incount, MPI_Request requests[], int *outcount, int indices[], MPI_Status statuses[]

Re: [OMPI users] OpenMPI + Hadoop

2014-03-21 Thread Saliya Ekanayake
Hi Ralph, This is regarding the MapReduce support with OpenMPI for which you gave a good amount of info previously. I have several MR applications that I'd like to test for performance in an HPC cluster with OpenMPI. I found this presentation by you http://www.open-mpi.org/video/mrplus/Greenplum_R

[OMPI users] Call stack upon MPI routine error

2014-03-21 Thread Vince Grimes
OpenMPI folks: I have mentioned before a problem with an in-house code (ScalIT) that generates the error message [[31552,1],84][btl_openib_component.c:3492:handle_wc] from compute-4-5.local to: compute-4-13 error polling LP CQ with status LOCAL QP OPERATION ERROR status number 2 for wr_id 2

Re: [OMPI users] Segmentation Fault

2014-03-21 Thread Madison Stemm
Hi Madhurima, I'm also having this issue. While I can't provide any assistance, I'd be interested in being kept abreast of any solution as it may assist me as well. ~Maddie On Fri, Mar 21, 2014 at 12:26 AM, madhurima madhunapanthula < erankima...@gmail.com> wrote: > > Hi, > > Iam trying to lin

Re: [OMPI users] OpenMPI job initializing problem

2014-03-21 Thread Beichuan Yan
Good suggestion. This overall walltime reveals little difference between Intel MPI and Open MPI, for example: intelmpi=3.76 mins and openmpi=3.73 mins, while PBS pro shows intelmpi=3.82 mins and openmpi=3.80 mins. Beichuan -Original Message- From: users [mailto:users-boun...@open-mpi.

Re: [OMPI users] testsome returns negative indices

2014-03-21 Thread Ross Boylan
On 3/21/2014 10:02 AM, Jeff Squyres (jsquyres) wrote: So just to be clear, the C interface for MPI_Testsome is: int MPI_Testsome(int incount, MPI_Request requests[], int *outcount, int indices[], MPI_Status statuses[]); And your R call is: mpi_errhan

[OMPI users] EuroMPI/ASIA 2014: CFP

2014-03-21 Thread George Bosilca
*** * EuroMPI/ASIA 2014 Call for Papers * * The 21st European MPI Users' Group Meeting * * Kyoto, Japan* * 9th - 12th September, 2014 * * www.eurompi2014.org* ***

Re: [OMPI users] Fwd: problem for multiple clusters using mpirun

2014-03-21 Thread Jeff Squyres (jsquyres)
Do you have any firewalling enabled on these machines? If so, you'll want to either disable it, or allow random TCP connections between any of the cluster nodes. On Mar 21, 2014, at 10:24 AM, Hamid Saeed wrote: > /sbin/ifconfig > > hsaeed@karp:~$ /sbin/ifconfig > br0 Link encap:Ethern

Re: [OMPI users] testsome returns negative indices

2014-03-21 Thread Jeff Squyres (jsquyres)
So just to be clear, the C interface for MPI_Testsome is: int MPI_Testsome(int incount, MPI_Request requests[], int *outcount, int indices[], MPI_Status statuses[]); And your R call is: mpi_errhandler(MPI_Testsome(countn, request, &INTEGER(indices)[0],

Re: [OMPI users] testsome returns negative indices

2014-03-21 Thread Ross Boylan
On Fri, 2014-03-21 at 14:11 +, Jeff Squyres (jsquyres) wrote: > Is that C or R code? C. > > If it's R, I think the next step would be to check the R wrapper for > MPI_Testsome and see what is actually being returned by OMPI in C before it > gets converted to R. I'm afraid I don't know R, so

Re: [OMPI users] Help building/installing a working Open MPI 1.7.4 on OS X 10.9.2 with Free PGI Fortran

2014-03-21 Thread Jeff Squyres (jsquyres)
This is starting to smell like a Libtool and/or Automake bug -- it created libmpi_usempi_ignore_tkr.dylib, but it tried to install libmpi_usempi_ignore_tkr.0.dylib (notice the extra ".0"). :-\ This is both good and bad. Good: I can think of 2 ways to work around this issue off the top of my he

Re: [OMPI users] Fwd: problem for multiple clusters using mpirun

2014-03-21 Thread Hamid Saeed
/sbin/ifconfig hsaeed@karp:~$ /sbin/ifconfig br0 Link encap:Ethernet HWaddr 00:25:90:59:c9:ba inet addr:134.106.3.231 Bcast:134.106.3.255 Mask:255.255.255.0 inet6 addr: fe80::225:90ff:fe59:c9ba/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Re: [OMPI users] Fwd: problem for multiple clusters using mpirun

2014-03-21 Thread Jeff Squyres (jsquyres)
On Mar 21, 2014, at 10:09 AM, Hamid Saeed wrote: > > I think i have a tcp connection. As for as i know my cluster is not > > configured for Infiniband (IB). Ok. > > but even for tcp connections. > > > > mpirun -n 2 -host master,node001 --mca btl tcp,sm,self ./helloworldmpi > > mpirun -n 2 -hos

Re: [OMPI users] testsome returns negative indices

2014-03-21 Thread Jeff Squyres (jsquyres)
Is that C or R code? If it's R, I think the next step would be to check the R wrapper for MPI_Testsome and see what is actually being returned by OMPI in C before it gets converted to R. I'm afraid I don't know R, so I can't really comment on the syntax / correctness of your code snipit. If i

[OMPI users] Fwd: problem for multiple clusters using mpirun

2014-03-21 Thread Hamid Saeed
-- Forwarded message -- From: Jeff Squyres (jsquyres) List-Post: users@lists.open-mpi.org Date: Fri, Mar 21, 2014 at 3:05 PM Subject: Re: problem for multiple clusters using mpirun To: Hamid Saeed Please reply on the mailing list; more people can reply that way, and the answers

Re: [OMPI users] problem for multiple clusters using mpirun

2014-03-21 Thread Jeff Squyres (jsquyres)
On Mar 21, 2014, at 8:52 AM, Ralph Castain wrote: > Looks like you don't have an IB connection between "master" and "node001" +1 Assumedly, you have InfiniBand (or RoCE? Or iWARP?) installed on your cluster, right? (otherwise, the openib BTL won't be useful for you) Note that most of the tim

Re: [OMPI users] OpenMPI job initializing problem

2014-03-21 Thread Ralph Castain
One thing to check would be the time spent between MPI_Init and MPI_Finalize - i.e., see if the time difference is caused by differences in init and finalize themselves. My guess is that is the source - would help us target the problem. On Mar 20, 2014, at 9:00 PM, Beichuan Yan wrote: > Here

Re: [OMPI users] coll_ml_priority in openmpi-1.7.5

2014-03-21 Thread Ralph Castain
On Mar 20, 2014, at 5:56 PM, tmish...@jcity.maeda.co.jp wrote: > > Hi Ralph, congratulations on releasing new openmpi-1.7.5. > > By the way, opnempi-1.7.5rc3 has been slowing down our application > with smaller size of testing data, where the time consuming part > of our application is so calle

Re: [OMPI users] problem for multiple clusters using mpirun

2014-03-21 Thread Ralph Castain
Looks like you don't have an IB connection between "master" and "node001" On Mar 21, 2014, at 12:43 AM, Hamid Saeed wrote: > Hello All: > > I know there will be some one who can help me in solving this problem. > > I can compile my helloworld.c program using mpicc and I have confirmed that >

Re: [OMPI users] Heterogeneous cluster problem - mixing AMD and Intel nodes

2014-03-21 Thread hsaeed
Victor gmail.com> writes: > > I got 4 x AMD A-10 6800K nodes on loan for a few months and added them to my existing Intel nodes. > All nodes share the relevant directories via NFS. I have OpenMPI 1.6.5 which was build with Open-MX 1.5.3 support networked via GbE. > > All nodes run Ubuntu 12.0

[OMPI users] problem for multiple clusters using mpirun

2014-03-21 Thread Hamid Saeed
Hello All: I know there will be some one who can help me in solving this problem. - I can compile my helloworld.c program using mpicc and I have confirmed that the script runs correctly on another working cluster, so the local paths are set up correctly I think and the script defini

[OMPI users] Segmentation Fault

2014-03-21 Thread madhurima madhunapanthula
Hi, Iam trying to link the jumpshot libraries with the graph500 (mpi_tuned_2d sources). After linkin the libraries and executing mpirun with the graph500_mpi_custome_n binaries Iam getting the following segmenation fault. I have no clue as to where the issue is. When I dont link the jumpshot libra

Re: [OMPI users] OpenMPI job initializing problem

2014-03-21 Thread Beichuan Yan
Here is an example of my data measured in seconds: communication overhead = commuT + migraT + print, compuT is computational cost, totalT = compuT + communication overhead, overhead% denotes percentage of communication overhead intelmpi (walltime=00:03:51) iter [commuT migraT