On Oct 28, 2010, at 2:50 PM, Ray Muno wrote:
> On 10/28/2010 01:40 PM, Scott Atchley wrote:
>
>>
>> Does your environment have LD_LIBRARY_PATH set to point to $OMPI/lib and
>> $MX/lib? Does it get set on login? Is $OMPI/bin in your PATH?
>>
>> Scott
>
On Oct 28, 2010, at 2:18 PM, Ray Muno wrote:
> On 10/22/2010 07:36 AM, Scott Atchley wrote:
>> Ray,
>>
>> Looking back at your original message, you say that it works if you use the
>> Myricom supplied mpirun from the Myrinet roll. I wonder if this is a
>>
I am wondering if ldd find the libraries from your compile or the Myrinet roll.
Scott
On Oct 21, 2010, at 10:39 AM, Raymond Muno wrote:
> On 10/20/2010 8:30 PM, Scott Atchley wrote:
>> We have fixed this bug in the most recent 1.4.x and 1.5.x releases.
>>
>> Scott
> OK, a few
On Oct 20, 2010, at 9:43 PM, Raymond Muno wrote:
> On 10/20/2010 8:30 PM, Scott Atchley wrote
>> Are you building OMPI with support for both MX and IB? If not and you only
>> want MX support, try configuring OMPI using --disable-memory-manager (check
>> configure
On Oct 20, 2010, at 9:22 PM, Raymond Muno wrote:
> On 10/20/2010 7:59 PM, Ralph Castain wrote:
>> The error message seems to imply that mpirun itself didn't segfault, but
>> that something else did. Is that segfault pid from mpirun?
>>
>> This kind of problem usually is caused by mismatched buil
Hi Bibrak,
The message about malloc looks like a MX message. Which interconnects did you
compile support for?
If you are using MX, does it appear when you run with:
$ mpirun --mca pml cm -np 4 ./exec 98
which uses the MX MTL instead of MX BTL.
Scott
On Jul 18, 2010, at 9:23 AM, Bibrak Qamar
Lydia,
Which interconnect is this running over?
Scott
On Jul 15, 2010, at 5:19 AM, Lydia Heck wrote:
> We are running Sun's build of Open Mpi 1.3.3r21324-ct8.2-b09b-r31
> (HPC8.2) and one code that runs perfectly fine under
> HPC8.1 (Open MPI) 1.3r19845-ct8.1-b06b-r21 and before fails with
>
On Jun 4, 2010, at 7:18 PM, Audet, Martin wrote:
> Hi OpenMPI_Users and OpenMPI_Developers,
>
> I'm unable to connect a client application using MPI_Comm_connect() to a
> server job (the server job calls MPI_Open_port() before calling by
> MPI_Comm_accept()) when the server job uses MX MTL (alt
On Jun 3, 2010, at 8:54 AM, guillaume ranquet wrote:
> granquet@bordeplage-15 ~ $ mpirun --mca btl mx,openib,sm,self --mca pml
> ^cm --mca mpi_leave_pinned 0 ~/bwlat/mpi_helloworld
> [bordeplage-15.bordeaux.grid5000.fr:02707] Error in mx_init (error No MX
> device entry in /dev.)
> Hello world fro
On Jun 2, 2010, at 1:31 PM, guillaume ranquet wrote:
> granquet@bordeplage-9 ~/openmpi-1.4.2 $ ~/openmpi-1.4.2-bin/bin/mpirun
> - --mca btl openib,sm,self --mca pml ^cm ~/bwlat/mpi_helloworld
> Hello world from process 0 of 1
> granquet@bordeplage-9 ~/openmpi-1.4.2 $
>
> I can tell it works :)
O
On Jun 2, 2010, at 1:51 PM, Jeff Squyres wrote:
>>> Ok, there is no segfault when it can't find IB.
>
> I'm not sure I follow this comment.
MX initialization is interfering on IB nodes (that do not have MX). I wanted to
make sure the opposite was not true (and it is not). :-)
Scott
On Jun 2, 2010, at 11:52 AM, Scott Atchley wrote:
> What if you explicitly disable MX?
>
> ~/openmpi-1.4.2-bin/bin/mpirun --mca btl openib,sm,self --mca btl ^mx
> ~/bwlat/mpi_helloworld
And can you try this as well?
~/openmpi-1.4.2-bin/bin/mpirun --mca btl openib,sm,self -
On Jun 2, 2010, at 11:14 AM, guillaume ranquet wrote:
>> What happens if you run:
>>
>> ~/openmpi-1.4.2-bin/bin/mpirun --mca btl openib,sm,self
>> ~/bwlat/mpi_helloworld
>>
>> (i.e., MX support is still compiled in, but remove MX from the run-time)
>
> sadly, exactly the same thing :(
> it doe
On Jun 2, 2010, at 9:54 AM, Jeff Squyres wrote:
>> this is the output I get on a node with ethernet and infiniband hardware.
>> note the Error regarding mx.
>>
>> $ ~/openmpi-1.4.2-bin/bin/mpirun ~/bwlat/mpi_helloworld
>> [bordeplage-9.bordeaux.grid5000.fr:32365] Error in mx_init (error No MX
>>
On May 16, 2010, at 1:32 PM, Lydia Heck wrote:
> When running over gigabit using -mca btl tcp,self,sm the code runs alright,
> which is good as the largest part of our cluster is over gigabit, and as
> Gadget-3 scales rather well, the penalty for running over gigabit is not
> prohibitive.
> We
On Jan 14, 2010, at 3:08 PM, Jeff Squyres wrote:
On Jan 14, 2010, at 1:59 PM, TONY BASIL wrote:
I am doing a project with an HPC set up on multicore Power
PC..Nodes will be connected
using Rapid I/O instead for Gigabit Ethernet...I would like to know
if OpenMPI supports
Rapid I/O...
I'm
On Aug 26, 2009, at 4:20 PM, twu...@goodyear.com wrote:
I see. My one script for all clusters calls
mpirun --mca btl openib,mx,gm,tcp,sm,self
so I'd need to add some logic above the mpirun line to figure out what
cluster I am on to setup the correct mpirun line.
still seems like I shou
On Aug 26, 2009, at 3:41 PM, twu...@goodyear.com wrote:
When, for example, I run on an IB cluster, I get warning messages
about not
finding GM NICS and another transport will be used etc.
And warnings about mca btl mx components not found etc. It DOES run
the
IB, but it never says that in
On Aug 18, 2009, at 10:59 AM, Oskar Enoksson wrote:
The question is, however, why is cl120 not acking messages? What
is the application? What MPI calls does this application use?
Scott
The reason in this case was that cl120 had some kind of hardware
problem, perhaps memory error or myrin
On Aug 17, 2009, at 2:43 PM, Jeff Squyres wrote:
George / Myricom --
Does the MX MTL abort if it gets a "disconnected" error back from
libmyriexpress?
Short answer: yes.
Long answer:
The messages below indicate that these processes were all trying to
send to cl120. It did not ack their
ase I
attached in my previous mail work. Very suspicious, but at least this
does make a functional solution (however, if I understand OpenMPI
correctly, I shouldn't be able to use the CM PML over a network where
some nodes have MX and some don't, correct?).
Scott Atchley atchley-at-myri.com |o
Hi Kris,
I have not run your code yet, but I will try to this weekend.
You can have MX checksum its messages if you set MX_CSUM=1 and use the
MX debug library (e.g. LD_LIBRARY_PATH to /opt/mx/lib/debug).
Do you have the problem if you use the MX MTL? To test it modify your
mpirun as follow
On Jun 28, 2009, at 7:14 AM, Dave Love wrote:
Scott Atchley writes:
George's answer supersedes mine. You must be using the MX bonding
driver to use more than one NIC per host.
Will that be relevant for Open-MX, which I'm using rather than normal
MX? (I'm afraid I don't
On Jun 26, 2009, at 9:45 AM, Dave Love wrote:
Scott Atchley writes:
I believe the answer is yes as long as all NICs are in the same
fabric
(they usually are).
Thanks. Do you mean it won't if, in this case, the two NICs are on
separate switches?
Dave,
George's answer super
On Jun 25, 2009, at 1:02 PM, Dave Love wrote:
Also, Brice Goglin, the Open-MX author had a couple of questions
concerning multi-rail MX while I'm on:
1. Does the MX MTL work with multi-rail?
I believe the answer is yes as long as all NICs are in the same fabric
(they usually are).
2. "Yo
tough2_mp with the Aztec library, however
building it with mpich_mx instead of OpenMPI does not give problems.
Thanks
Henk
-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
On
Behalf Of Scott Atchley
Sent: 19 June 2009 23:23
To: Open MPI Users
Subject: Re
On Jun 19, 2009, at 1:05 PM, SLIM H.A. wrote:
Although the mismatch between MX lib version and the kernel version
appears to cause the mx_init error this should never be called as
there
is no mx card on those nodes.
Thanks in advance for any advice to solve this
Henk
Henk,
Is MX statical
Francois,
How many cores do your machines have?
The file specifies THREADS_DEFAULT 16. Does this spawn 16 threads per
MPI rank?
I see crashes when I run this with MX (BTL with mx,sm,self and MTL).
If I change THREADS_DEFAULT to 4, I see crashes with TCP (BTL with
tcp,sm,self) as well.
On Jun 11, 2009, at 2:20 PM, François Trahay wrote:
The stack trace is from the MX MTL (I attach the backtraces I get
with both MX MTL and MX BTL)
Here is the program that I use. It is quite simple. It runs ping
pongs concurrently (with one thread per node, then with two threads
per node, e
directly on top of the underlying
network capabilities. However, there are clearly few places where
thread safety should be enforced in the MTL layer, and I don't know
if this is the case.
george.
On Jun 11, 2009, at 09:35 , Scott Atchley wrote:
Francois,
For threads, the FAQ h
.
Francois
Scott Atchley wrote:
Hi Francois,
I am not familiar with the internals of the OMPI code. Are you
sure, however, that threads are fully supported yet? I was under
the impression that thread support was still partial.
Can anyone else comment?
Scott
On Jun 8, 2009, at 8:43 AM
Hi Francois,
I am not familiar with the internals of the OMPI code. Are you sure,
however, that threads are fully supported yet? I was under the
impression that thread support was still partial.
Can anyone else comment?
Scott
On Jun 8, 2009, at 8:43 AM, François Trahay wrote:
Hi,
I'm en
On Jun 3, 2009, at 6:05 AM, tsi...@coas.oregonstate.edu wrote:
Top always shows all the paralell processes at 100% in the %CPU
field, although some of the time these must be waiting for a
communication to complete. How can I see actual processing as
opposed to waiting at a barrier?
Thanks
On May 4, 2009, at 10:54 AM, Ricardo Fernández-Perea wrote:
I finally have opportunity to run the imb-3.2 benchmark over myrinet
I am running in a cluster of 16 node Xservers connected with myrinet
15 of them are 8core ones and the last one is a 4 cores one. Having
a limit of 124 process
On May 5, 2009, at 2:47 PM, Jeff Squyres wrote:
On May 5, 2009, at 1:59 PM, Robert Kubrick wrote:
I am preparing a presentation where I will discuss commodity
interconnects and the evolution of Ethernet and Infiniband NICs. The
idea is to show the advance in network interfaces speed over time
On Mar 20, 2009, at 11:33 AM, Ricardo Fernández-Perea wrote:
This are the results initially
Running 1000 iterations.
Length Latency(us)Bandwidth(MB/s)
0 2.738 0.000
1 2.718 0.368
2 2.707 0.739
10485764392.217
On Mar 20, 2009, at 5:59 AM, Ricardo Fernández-Perea wrote:
Hello,
I am running DL_POLY in various Xserver 8 processor with a myrinet
network.using mx-1.2.7
While I keep in the same node the process scales reasonably well but
in the moment I hit the network ...
I will like to try to max
On Dec 5, 2008, at 12:22 PM, Justin wrote:
Does OpenMPI have any known deadlocks that might be causing our
deadlocks?
Known deadlocks, no. We are assisting a customer, however, with a
deadlock that occurs in IMB Alltoall (and some other IMB tests) when
using 128 hosts and the MX BTL. We h
Hi all,
We had a customer using 1.2.6 with MX. We were running his jobs, some
of which used the MX BTL and some used the MX MTL.
He added a few more nodes to the cluster and installed the same OMPI.
When we tried to run jobs that spanned the new nodes, the jobs failed.
I did not keep the
Jeff,
If I remember correctly, Microsoft dropped support for .AVI 3-4 years
ago so it can no longer be played by their media player. It is also
not native to QT, so you will have to download a plugin (I have it
somewhere if you want me to look for it).
I do not know if there is a containe
On Jul 10, 2007, at 3:24 PM, Tim Prins wrote:
On Tuesday 10 July 2007 03:11:45 pm Scott Atchley wrote:
On Jul 10, 2007, at 2:58 PM, Scott Atchley wrote:
Tim, starting with the recently released 1.2.1, it is the default.
To clarify, MX_RCACHE=1 is the default.
It would be good for the
On Jul 10, 2007, at 2:58 PM, Scott Atchley wrote:
Tim, starting with the recently released 1.2.1, it is the default.
To clarify, MX_RCACHE=1 is the default.
Scott
always use MX_RCACHE=2 for both MTL and BTL. So far I didn't had
any problems with it.
george.
On Jul 10, 2007, at 2:37 PM, Brian Barrett wrote:
On Jul 10, 2007, at 11:40 AM, Scott Atchley wrote:
On Jul 10, 2007, at 1:14 PM, Christopher D. Maestas wrote:
Has anyone seen the following me
On Jul 10, 2007, at 2:37 PM, Brian Barrett wrote:
Scott -
I'm having trouble getting the warning to go away with Open MPI.
I've disabled our copy of ptmalloc2, so we're not providing a malloc
anymore. I'm wondering if there's also something with the use of
DSOs to load libmyriexpress? Is your
anage memory, so MX_RCACHE=1 is safe to use unless
the user's application manages memory.
Scott
--
Scott Atchley
Myricom Inc.
http://www.myri.com
On Jul 6, 2007, at 7:37 AM, SLIM H.A. wrote:
Dear Michael
I have now tried both
mpirun --mca btl mx,sm -np 4 ./cpi
which gives the same error message again, and,
mpirun --mca btl mx,sm,self -np 4 ./cpi_gcc_ompi_mx
actually locks some of the mx ports but not all 4, ie this is the
output fr
On Feb 14, 2007, at 12:33 PM, Alex Tumanov wrote:
Hello,
I recently tried running HPLinpack, compiled with OMPI, over myrinet
MX interconnect. Running a simple hello world program works, but XHPL
fails with an error occurring when it tries to MPI_Send:
# mpirun -np 4 -H l0-0,c0-2 --prefix $MPI
On Jan 18, 2007, at 8:11 AM, Peter Kjellstrom wrote:
with Lustre, which is about 55% of the
theoretical 20 Gb/s advertised speed.
I think this should be calculated against 16 Gbps, not 20 Gbps.
What is the advertised speed of a IB DDR card?
http://mellanox.com/products/hca_cards.php
http://
On Jan 18, 2007, at 5:05 AM, Peter Kjellstrom wrote:
On Thursday 18 January 2007 09:52, Robin Humble wrote:
...
is ~10Gbit the best I can expect from 4x DDR IB with MPI?
some docs @HP suggest up to 16Gbit (data rate) should be possible,
and
I've heard that 13 or 14 has been achieved before.
On Jan 17, 2007, at 10:45 AM, Brian Budge wrote:
Hi Adrian -
Thanks for the reply. I have been investigating this further. It
appears that ssh isn't starting my .zshrc file. This is strange.
You should check the zsh man page. .zshrc is for interactive logins
only. You may want to use .
son working on the DR PML like me to try anymore tests?
Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On Dec 7, 2006, at 9:50 AM, Scott Atchley wrote:
On Dec 6, 2006, at 3:09 PM, Scott Atchley wrote:
Brock and Galen,
We are willing to assist. Our best guess is t
On Dec 6, 2006, at 3:09 PM, Scott Atchley wrote:
Brock and Galen,
We are willing to assist. Our best guess is that OMPI is using the
code in a way different than MPICH-GM does. One of our other
developers who is more comfortable with the GM API is looking into it.
We tried running with HPCC
On Dec 6, 2006, at 2:29 PM, Brock Palen wrote:
I wonder if we can narrow this down a bit to perhaps a PML protocol
issue.
Start by disabling RDMA by using:
-mca btl_gm_flags 1
On the other-hand, with OB1 using btl_gm_flags 1 fixed the error
problem with OMPI! Which is a great first step.
On Dec 5, 2006, at 6:15 PM, Galen M. Shipman wrote:
Brock Palen wrote:
I was asked by mirycom to run a test using the data reliability pml.
(dr) I ran it like so:
$ mpirun --mca pml dr -np 4 ./xhpl
Is this the right format for running the dr pml?
This should be fine, yes.
I can running H
On Nov 29, 2006, at 8:44 AM, Scott Atchley wrote:
My last few runs all completed successfully without hanging. The job
I am currently running just hung one node (can respond to ping,
cannot ssh into it, cannot use any terminals connected to it).
There are no messages in dmesg and vmstat shows
On Nov 21, 2006, at 1:27 PM, Brock Palen wrote:
I had sent a message two weeks ago about this problem and talked with
jeff at SC06 about how it might not be a OMPI problem. But it
appears now working with myricom that it is a problem in both
lam-7.1.2 and openmpi-1.1.2/1.1.1. Basically the re
On Nov 21, 2006, at 1:27 PM, Brock Palen wrote:
I had sent a message two weeks ago about this problem and talked with
jeff at SC06 about how it might not be a OMPI problem. But it
appears now working with myricom that it is a problem in both
lam-7.1.2 and openmpi-1.1.2/1.1.1. Basically the re
57 matches
Mail list logo