I could roughly understand what the coll_ml is and how you
are going to treat it, thanks.
As Ralph pointed out, I didn't see coll_ml was really used.
I just thought the slowdown meant it was used. I'll check it
later. It might be due to the expensive connectivity computation.
Tetsuya
> One of
One of the authors of ML mentioned to me off-list that he has an idea what
might have been causing the slowdown. They're actively working on tweaking and
making things better.
I told them to ping you -- the whole point is that ml is supposed to be
*better* than our existing collectives, so if
On Mar 21, 2014, at 3:26 AM, madhurima madhunapanthula
wrote:
> Iam trying to link the jumpshot libraries with the graph500 (mpi_tuned_2d
> sources).
> After linkin the libraries and executing mpirun with the
> graph500_mpi_custome_n binaries Iam getting the following segmenation fault.
Are y
On Mar 21, 2014, at 4:13 PM, Ross Boylan wrote:
> There was a problem in the R code that cause MPI_Request objects to be reused
> before the original request completed.
> Things are working much better now, though some bugs remain (not necessarily
> related to MPI_Isend or Testsome).
>
> Just
Hi, Vince
Couple of ideas off the top of my head:
1. Try disabling eager RDMA. Eager RDMA can consume significant resources:
"-mca btl_openib_use_eager_rdma 0"
2. Try using the TCP BTL - is the error still present?
3. Try the poor man's debugger - print the pid and hostname of the process
wh
On 3/21/2014 10:17 AM, Ross Boylan wrote:
On 3/21/2014 10:02 AM, Jeff Squyres (jsquyres) wrote:
So just to be clear, the C interface for MPI_Testsome is:
int MPI_Testsome(int incount, MPI_Request requests[],
int *outcount, int indices[],
MPI_Status statuses[]
Hi Ralph,
This is regarding the MapReduce support with OpenMPI for which you gave a
good amount of info previously. I have several MR applications that I'd
like to test for performance in an HPC cluster with OpenMPI. I found this
presentation by you
http://www.open-mpi.org/video/mrplus/Greenplum_R
OpenMPI folks:
I have mentioned before a problem with an in-house code (ScalIT) that
generates the error message
[[31552,1],84][btl_openib_component.c:3492:handle_wc] from
compute-4-5.local to: compute-4-13 error polling LP CQ with status LOCAL
QP OPERATION ERROR status number 2 for wr_id 2
Hi Madhurima,
I'm also having this issue. While I can't provide any assistance, I'd be
interested in being kept abreast of any solution as it may assist me as
well.
~Maddie
On Fri, Mar 21, 2014 at 12:26 AM, madhurima madhunapanthula <
erankima...@gmail.com> wrote:
>
> Hi,
>
> Iam trying to lin
Good suggestion.
This overall walltime reveals little difference between Intel MPI and Open MPI,
for example: intelmpi=3.76 mins and openmpi=3.73 mins, while PBS pro shows
intelmpi=3.82 mins and openmpi=3.80 mins.
Beichuan
-Original Message-
From: users [mailto:users-boun...@open-mpi.
On 3/21/2014 10:02 AM, Jeff Squyres (jsquyres) wrote:
So just to be clear, the C interface for MPI_Testsome is:
int MPI_Testsome(int incount, MPI_Request requests[],
int *outcount, int indices[],
MPI_Status statuses[]);
And your R call is:
mpi_errhan
***
* EuroMPI/ASIA 2014 Call for Papers *
* The 21st European MPI Users' Group Meeting *
* Kyoto, Japan*
* 9th - 12th September, 2014 *
* www.eurompi2014.org*
***
Do you have any firewalling enabled on these machines? If so, you'll want to
either disable it, or allow random TCP connections between any of the cluster
nodes.
On Mar 21, 2014, at 10:24 AM, Hamid Saeed wrote:
> /sbin/ifconfig
>
> hsaeed@karp:~$ /sbin/ifconfig
> br0 Link encap:Ethern
So just to be clear, the C interface for MPI_Testsome is:
int MPI_Testsome(int incount, MPI_Request requests[],
int *outcount, int indices[],
MPI_Status statuses[]);
And your R call is:
mpi_errhandler(MPI_Testsome(countn, request, &INTEGER(indices)[0],
On Fri, 2014-03-21 at 14:11 +, Jeff Squyres (jsquyres) wrote:
> Is that C or R code?
C.
>
> If it's R, I think the next step would be to check the R wrapper for
> MPI_Testsome and see what is actually being returned by OMPI in C before it
> gets converted to R. I'm afraid I don't know R, so
This is starting to smell like a Libtool and/or Automake bug -- it created
libmpi_usempi_ignore_tkr.dylib, but it tried to install
libmpi_usempi_ignore_tkr.0.dylib (notice the extra ".0"). :-\
This is both good and bad.
Good: I can think of 2 ways to work around this issue off the top of my he
/sbin/ifconfig
hsaeed@karp:~$ /sbin/ifconfig
br0 Link encap:Ethernet HWaddr 00:25:90:59:c9:ba
inet addr:134.106.3.231 Bcast:134.106.3.255 Mask:255.255.255.0
inet6 addr: fe80::225:90ff:fe59:c9ba/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
On Mar 21, 2014, at 10:09 AM, Hamid Saeed wrote:
> > I think i have a tcp connection. As for as i know my cluster is not
> > configured for Infiniband (IB).
Ok.
> > but even for tcp connections.
> >
> > mpirun -n 2 -host master,node001 --mca btl tcp,sm,self ./helloworldmpi
> > mpirun -n 2 -hos
Is that C or R code?
If it's R, I think the next step would be to check the R wrapper for
MPI_Testsome and see what is actually being returned by OMPI in C before it
gets converted to R. I'm afraid I don't know R, so I can't really comment on
the syntax / correctness of your code snipit.
If i
-- Forwarded message --
From: Jeff Squyres (jsquyres)
List-Post: users@lists.open-mpi.org
Date: Fri, Mar 21, 2014 at 3:05 PM
Subject: Re: problem for multiple clusters using mpirun
To: Hamid Saeed
Please reply on the mailing list; more people can reply that way, and the
answers
On Mar 21, 2014, at 8:52 AM, Ralph Castain wrote:
> Looks like you don't have an IB connection between "master" and "node001"
+1
Assumedly, you have InfiniBand (or RoCE? Or iWARP?) installed on your cluster,
right? (otherwise, the openib BTL won't be useful for you)
Note that most of the tim
One thing to check would be the time spent between MPI_Init and MPI_Finalize -
i.e., see if the time difference is caused by differences in init and finalize
themselves. My guess is that is the source - would help us target the problem.
On Mar 20, 2014, at 9:00 PM, Beichuan Yan wrote:
> Here
On Mar 20, 2014, at 5:56 PM, tmish...@jcity.maeda.co.jp wrote:
>
> Hi Ralph, congratulations on releasing new openmpi-1.7.5.
>
> By the way, opnempi-1.7.5rc3 has been slowing down our application
> with smaller size of testing data, where the time consuming part
> of our application is so calle
Looks like you don't have an IB connection between "master" and "node001"
On Mar 21, 2014, at 12:43 AM, Hamid Saeed wrote:
> Hello All:
>
> I know there will be some one who can help me in solving this problem.
>
> I can compile my helloworld.c program using mpicc and I have confirmed that
>
Victor gmail.com> writes:
>
> I got 4 x AMD A-10 6800K nodes on loan for a few months and added them to
my existing Intel nodes.
> All nodes share the relevant directories via NFS. I have OpenMPI 1.6.5
which was build with Open-MX 1.5.3 support networked via GbE.
>
> All nodes run Ubuntu 12.0
Hello All:
I know there will be some one who can help me in solving this problem.
-
I can compile my helloworld.c program using mpicc and I have confirmed
that the script runs correctly on another working cluster, so the local
paths are set up correctly I think and the script defini
Hi,
Iam trying to link the jumpshot libraries with the graph500 (mpi_tuned_2d
sources).
After linkin the libraries and executing mpirun with the
graph500_mpi_custome_n binaries Iam getting the following segmenation fault.
I have no clue as to where the issue is. When I dont link the jumpshot
libra
Here is an example of my data measured in seconds:
communication overhead = commuT + migraT + print, compuT is computational cost,
totalT = compuT + communication overhead, overhead% denotes percentage of
communication overhead
intelmpi (walltime=00:03:51)
iter [commuT migraT
28 matches
Mail list logo