A workaround is to add –disable-vt to your configure line if you do not care
about having vampirtrace support.
Not a solution, but might help you make progress.
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ilias Miroslav
Sent: Friday, November 13, 2015 11:30 AM
To: us...@open-mpi.
>
>Sent by Apple Mail
>
>Yang ZHANG
>
>PhD candidate
>
>Networking and Wide-Area Systems Group
>Computer Science Department
>New York University
>
>715 Broadway Room 705
>New York, NY 10003
>
>> On Sep 25, 2015, at 11:07 AM, Rolf vandeVaart
>wro
Hello Yang:
It is not clear to me if you are asking about a CUDA-aware build of Open MPI
where you do the MPI_Allreduce() or the GPU buffer or if you are handling
staging the GPU into host memory and then calling the MPI_Allreduce(). Either
way, they are somewhat similar. With CUDA-aware, the
Lev:
Can you run with --mca mpi_common_cuda_verbose 100 --mca mpool_rgpusm_verbose
100 and send me (rvandeva...@nvidia.com) the output of that.
Thanks,
Rolf
>-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Lev Givon
>Sent: Wednesday, September 02, 2015 7:1
here I can
>look, I could help to find the issue.
>
>Thanks a lot!
>
>Marcin
>
>
>On 08/28/2015 05:28 PM, Rolf vandeVaart wrote:
>> I am not sure why the distances are being computed as you are seeing. I do
>not have a dual rail card system to reproduce with. Ho
I am not sure why the distances are being computed as you are seeing. I do not
have a dual rail card system to reproduce with. However, short term, I think
you could get what you want by running like the following. The first argument
tells the selection logic to ignore locality, so both cards w
No, it is not. You have to use pml ob1 which will pull in the smcuda and
openib BTLs which have CUDA-aware built into them.
Rolf
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Subhra Mazumdar
Sent: Friday, August 21, 2015 12:18 AM
To: Open MPI Users
Subject: [OMPI users] cuda aware
paul...@us.ibm.com>
www.ibm.com<http://www.ibm.com>
- Original message -
From: Rolf vandeVaart mailto:rvandeva...@nvidia.com>>
Sent by: "users" mailto:users-boun...@open-mpi.org>>
To: Open MPI Users mailto:us...@open-mpi.org>>
Cc:
Subject: Re: [OMPI
I talked with Jeremia off list and we figured out what was going on. There is
the ability to use the cuMemcpyAsync/cuStreamSynchronize rather than the
cuMemcpy but it was never made the default for Open MPI 1.8 series. So, to get
that behavior you need the following:
--mca mpi_common_cuda_cum
Hi Shahzeb:
I believe another colleague of mine may have helped you with this issue (I was
not around last week). However, to help me better understand the issue you are
seeing, could you send me your config.log file from when you did the
configuration? You can just send to rvandeva...@nvidia
Just an FYI that this issue has been found and fixed and will be available in
the next release.
https://github.com/open-mpi/ompi-release/pull/357
Rolf
From: Rolf vandeVaart
Sent: Wednesday, July 01, 2015 4:47 PM
To: us...@open-mpi.org
Subject: RE: [OMPI users] 1.8.6 w/ CUDA 7.0 & GDR
Hi Stefan (and Steven who reported this earlier with CUDA-aware program)
I have managed to observed the leak when running LAMMPS as well. Note that
this has nothing to do with CUDA-aware features. I am going to move this
discussion to the Open MPI developer’s list to dig deeper into this iss
how you observed the behavior. Does the code need to run for a while to
see this?
Any suggestions on how I could reproduce this?
Thanks,
Rolf
From: Steven Eliuk [mailto:s.el...@samsung.com]
Sent: Tuesday, June 30, 2015 6:05 PM
To: Rolf vandeVaart
Cc: Open MPI Users
Subject: 1.8.6 w/ CUDA 7.0
-aware MPI_Reduce problem in Openmpi 1.8.5
Hi Rolf,
Thank you very much for clarifying the problem. Is there any plan to support
GPU RDMA for reduction in the future?
On Jun 17, 2015, at 1:38 PM, Rolf vandeVaart
mailto:rvandeva...@nvidia.com>> wrote:
Hi Fei:
The reduction support fo
Hi Fei:
The reduction support for CUDA-aware in Open MPI is rather simple. The GPU
buffers are copied into temporary host buffers and then the reduction is done
with the host buffers. At the completion of the host reduction, the data is
copied back into the GPU buffers. So, there is no use o
I think we bumped up a default value in Open MPI 1.8.5. To go back to the old
64Mbyte value try running with:
--mca mpool_sm_min_size 67108864
Rolf
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Aurélien Bouteiller
Sent: Tuesday, May 26, 2015 10:10 AM
To: Open MPI Users
Subject:
ti-Process Service
>
>Received from Lev Givon on Thu, May 21, 2015 at 11:32:33AM EDT:
>> Received from Rolf vandeVaart on Wed, May 20, 2015 at 07:48:15AM EDT:
>>
>> (snip)
>>
>> > I see that you mentioned you are starting 4 MPS daemons. Are
-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Lev Givon
>Sent: Tuesday, May 19, 2015 10:25 PM
>To: Open MPI Users
>Subject: Re: [OMPI users] cuIpcOpenMemHandle failure when using
>OpenMPI 1.8.5 with CUDA 7.0 and Multi-Process Service
>
&
I am not sure why you are seeing this. One thing that is clear is that you
have found a bug in the error reporting. The error message is a little garbled
and I see a bug in what we are reporting. I will fix that.
If possible, could you try running with --mca btl_smcuda_use_cuda_ipc 0. My
exp
Hi Lev:
Any chance you can try Open MPI 1.8.5rc3 and see if you see the same behavior?
That code has changed a bit from the 1.8.4 series and I am curious if you will
still see the same issue.
http://www.open-mpi.org/software/ompi/v1.8/downloads/openmpi-1.8.5rc3.tar.gz
Thanks,
Rolf
>-Ori
I still do not believe there is a way for you to steer your traffic based on
the thread that is calling into Open MPI. While you can spawn your own threads,
Open MPI is going to figure out what interfaces to use based on the
characteristics of the process during MPI_Init. Even if Open MPI decid
It is my belief that you cannot do this at least with the openib BTL. The IB
card to be used for communication is selected during the MPI _Init() phase
based on where the CPU process is bound to. You can see some of this selection
by using the --mca btl_base_verbose 1 flag. There is a bunch o
>-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Rolf
>vandeVaart
>Sent: Monday, March 30, 2015 9:37 AM
>To: Open MPI Users
>Subject: Re: [OMPI users] segfault during MPI_Isend when transmitting GPU
>arrays between multiple GPUs
>
>-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Lev Givon
>Sent: Sunday, March 29, 2015 10:11 PM
>To: Open MPI Users
>Subject: Re: [OMPI users] segfault during MPI_Isend when transmitting GPU
>arrays between multiple GPUs
>
>Recei
Hi Lev:
I am not sure what is happening here but there are a few things we can do to
try and narrow things done.
1. If you run with --mca btl_smcuda_use_cuda_ipc 0 then I assume this error
will go away?
2. Do you know if when you see this error it happens on the first pass through
your communica
Hi Jason:
The issue is that Open MPI is (presumably) a 64 bit application and it is
trying to load up a 64-bit libcuda.so.1 but not finding one. Making the link
as you did will not fix the problem (as you saw). In all my installations, I
also have a 64-bit driver installed in /usr/lib64/libcud
retry with a pre-release
version of Open MPI 1.8.5 that is available here and confirm it fixes your
issue. Any of the ones listed on that page should be fine.
http://www.open-mpi.org/nightly/v1.8/
Thanks,
Rolf
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Rolf vandeVaart
Sent
Let me try to reproduce this. This should not have anything to do with GPU
Direct RDMA. However, to eliminate it, you could run with:
--mca btl_openib_want_cuda_gdr 0.
Rolf
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Aulwes, Rob
Sent: Wednesday, February 11, 2015 2:17 PM
To: u
I think I found a bug in your program with how you were allocating the GPU
buffers. I will send you a version offlist with the fix.
Also, there is no need to rerun with the flags I had mentioned below.
Rolf
From: Rolf vandeVaart
Sent: Monday, January 12, 2015 9:38 AM
To: us...@open-mpi.org
That is strange, not sure why that is happening. I will try to reproduce with
your program on my system. Also, perhaps you could rerun with –mca
mpi_common_cuda_verbose 100 and send me that output.
Thanks
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Xun Gong
Sent: Sunday, Janu
The CUDA person is now responding. I will try and reproduce. I looked through
the zip file but did not see the mpirun command. Can this be reproduced with
-np 4 running across four nodes?
Also, in your original message you wrote "Likewise, it doesn't matter if I
enable CUDA support or not. "
.
Also, our defaults for openmpi-mca-params.conf are:
mtl=^mxm
btl=^usnic,tcp
btl_openib_flags=1
service nv_peer_mem status
nv_peer_mem module is loaded.
Kindest Regards,
-
Steven Eliuk,
From: Rolf vandeVaart mailto:rvandeva...@nvidia.com>>
Reply-To: Open MPI Users mailto:us...@open-m
The error 304 corresponds to CUDA_ERRROR_OPERATNG_SYSTEM which means an OS call
failed. However, I am not sure how that relates to the call that is getting
the error.
Also, the last error you report is from MVAPICH2-GDR, not from Open MPI. I
guess then I have a few questions.
1. Can yo
If you are utilizing the CUDA-aware support in Open MPI, can you send me an
email with some information about the application and the cluster you are on.
I will consolidate information.
Thanks,
Rolf (rvandeva...@nvidia.com)
-
Yes, I have reproduced. And I agree with your thoughts on configuring vs
runtime error. I will look into this.
Thanks,
Rolf
>-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Brock Palen
>Sent: Friday, September 05, 2014 5:22 PM
>To: Open MPI Users
>Subjec
Hi Christoph:
I will try and reproduce this issue and will let you know what I find. There
may be an issue with CUDA IPC support with certain traffic patterns.
Rolf
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Christoph Winter
Sent: Tuesday, August 26, 2014 2:46 AM
To: us...@open
odes, I had
>CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7
>
>instead of
>CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
>
>Sorry for the false bug and thanks for directing me toward the solution.
>
>Maxime
>
>
>Le 2014-08-19 09:15, Rolf vandeVaart a écrit :
>>
gpu-k20-08:46045] *** End of error message ***
>--
>mpiexec noticed that process rank 1 with PID 46045 on node gpu-k20-08
>exited on signal 11 (Segmentation fault).
>---
Just to help reduce the scope of the problem, can you retest with a non
CUDA-aware Open MPI 1.8.1? And if possible, use --enable-debug in the
configure line to help with the stack trace?
>-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime
>Boissonn
With Open MPI 1.8.1, the library will use the NIC that is "closest" to the CPU.
There was a bug in earlier versions of Open MPI 1.8 so that did not happen.
You can see this by running with some verbosity using the "btl_base_verbose"
flag. For example, this is what I observed on a two node clus
Do you need the vampire support in your build? If not, you could add this to
configure.
--disable-vt
>-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of
>jcabe...@computacion.cs.cinvestav.mx
>Sent: Monday, June 16, 2014 1:40 PM
>To: us...@open-mpi.org
>Sub
>-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime
>Boissonneault
>Sent: Tuesday, May 27, 2014 4:07 PM
>To: Open MPI Users
>Subject: Re: [OMPI users] Advices for parameter tuning for CUDA-aware MPI
>
>Answers inline too.
>>> 2) Is the absence of btl_ope
Answers inline...
>-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime
>Boissonneault
>Sent: Friday, May 23, 2014 4:31 PM
>To: Open MPI Users
>Subject: [OMPI users] Advices for parameter tuning for CUDA-aware MPI
>
>Hi,
>I am currently configuring a GPU c
.12
>1048576 765.65
>
>
>Can you clarify exactly where the problem come from?
>
>Regards,
>Filippo
>
>
>On Mar 4, 2014, at 12:17 AM, Rolf vandeVaart
>wrote:
>> Can you try running with --mca coll ^ml and see if things work?
>>
>>
Can you try running with --mca coll ^ml and see if things work?
Rolf
>-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Filippo Spiga
>Sent: Monday, March 03, 2014 7:14 PM
>To: Open MPI Users
>Subject: [OMPI users] 1.7.5rc1, error "COLL-ML ml_discover_hiera
I assume your first issue is happening because you configured hwloc with cuda
support which creates a dependency on libcudart.so. Not sure why that would
mess up Open MPI. Can you send me how you configured hwloc?
I am not sure I understand the second issue. Open MPI puts everything in lib
e
Yes, this was a bug with Open MPI 1.7.3. I could not reproduce it, but it was
definitely an issue in certain configurations.
Here was the fix. https://svn.open-mpi.org/trac/ompi/changeset/29762
We fixed it in Open MPI 1.7.4 and the trunk version, so as you have seen, they
do not have the prob
e a reasoning for this? Is there some documentation,
>which MPI calls are CUDA-aware and which not?
>
>Best regards
>
>Peter
>
>
>
>On 12/02/2013 02:18 PM, Rolf vandeVaart wrote:
>> Thanks for the report. CUDA-aware Open MPI does not currently supp
Thanks for the report. CUDA-aware Open MPI does not currently support doing
reduction operations on GPU memory.
Is this a feature you would be interested in?
Rolf
>-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Peter Zaspel
>Sent: Friday, November 29, 2
The CUDA-aware support is only available when running with the verbs interface
to Infiniband. It does not work with the PSM interface which is being used in
your installation.
To verify this, you need to disable the usage of PSM. This can be done in a
variety of ways, but try running like this
Let me try this out and see what happens for me. But yes, please go ahead and
send me the complete backtrace.
Rolf
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of KESTENER Pierre
Sent: Wednesday, October 30, 2013 11:34 AM
To: us...@open-mpi.org
Cc: KESTENER Pierre
Subject: [OMPI use
>Laboratories, NM, USA
>
>
>
>
>
>
>On 10/7/13 1:47 PM, "Rolf vandeVaart" wrote:
>
>>That might be a bug. While I am checking, you could try configuring with
>>this additional flag:
>>
>>--enable-mca-no-build=pml-bfo
>>
>>
That might be a bug. While I am checking, you could try configuring with this
additional flag:
--enable-mca-no-build=pml-bfo
Rolf
>-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Hammond,
>Simon David (-EXP)
>Sent: Monday, October 07, 2013 3:30 PM
>To:
We have done some work over the last year or two to add some CUDA-aware support
into the Open MPI library. Details on building and using the feature are here.
http://www.open-mpi.org/faq/?category=building#build-cuda
http://www.open-mpi.org/faq/?category=running#mpi-cuda-support
I am looking fo
3 2:59 PM
>To: Open MPI Users
>Cc: Rolf vandeVaart
>Subject: Re: [OMPI users] Trouble configuring 1.7.2 for Cuda 5.0.35
>
>Thank you for the quick reply Rolf,
> I personally don't know the Cuda libraries. I was hoping there had been a
>name change. I am on a Cray
It is looking for the libcuda.so file, not the libcudart.so file. So, maybe
--with-libdir=/usr/lib64
You need to be on a machine with the CUDA driver installed. What was your
configure command?
http://www.open-mpi.org/faq/?category=building#build-cuda
Rolf
>-Original Message-
>From:
With respect to the CUDA-aware support, Ralph is correct. The ability to send
and receive GPU buffers is in the Open MPI 1.7 series. And incremental
improvements will be added to the Open MPI 1.7 series. CUDA 5.0 is supported.
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.
Ed, how large are the messages that you are sending and receiving?
Rolf
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf
Of Ed Blosch
Sent: Thursday, June 27, 2013 9:01 AM
To: us...@open-mpi.org
Subject: Re: [OMPI users] Application hangs on mpi_waitall
It ran a bit
I can speak to part of your issue. There are no CUDA-aware features in the 1.6
series of Open MPI. Therefore, the various configure flags you tried would not
affect Open MPI itself. Those configure flags are relevant with the 1.7 series
and later, but as the FAQ says, the CUDA-aware feature i
Yes, unfortunately, that issue is still unfixed. I just created the ticket and
included a possible workaround.
https://svn.open-mpi.org/trac/ompi/ticket/3531
>-Original Message-
>From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
>On Behalf Of Russell Power
>Sent: Mond
Hi Justin:
I assume you are running on a single node. In that case, Open MPI is supposed
to take advantage of the CUDA IPC support. This will be used only when
messages are larger than 4K, which yours are. In that case, I would have
expected that the library would exchange some messages and
Not sure. I will look into this. And thank you for the feedback Jens!
Rolf
>-Original Message-
>From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
>On Behalf Of Jeff Squyres
>Sent: Thursday, November 08, 2012 8:49 AM
>To: Open MPI Users
>Subject: Re: [OMPI users] mpi_l
And just to give a little context, ompi-clean was created initially to "clean"
up a node, not for cleaning up a specific job. It was for the case where MPI
jobs would leave some files behind or leave some processes running. (I do not
believe this happens much at all anymore.) But, as was said
To answer the original questions, Open MPI will look at taking advantage of the
RDMA CUDA when it is available. Obviously, work needs to be done to figure out
the best way to integrate into the library. Much like there are a variety of
protocols under the hood to support host transfer of data
>-Original Message-
>From: Jeff Squyres [mailto:jsquy...@cisco.com]
>Sent: Thursday, August 09, 2012 9:45 AM
>To: Open MPI Users
>Cc: Rolf vandeVaart
>Subject: CUDA in v1.7? (was: Compilation of OpenMPI 1.5.4 & 1.6.X fail for PGI
>compiler...)
>
>On Aug 9,
The current implementation does assume that the GPUs are on the same IOH and
therefore can use the IPC features of the CUDA library for communication.
One of the initial motivations for this was that to be able to detect whether
GPUs can talk to one another, the CUDA library has to be initialized
Yes, this feature is in Open MPI 1.7. It is implemented in the "smcuda" btl.
If you configure as outlined in the FAQ, then things should just work. The
smcuda btl will be selected and P2P will be used between GPUs on the same node.
This is only utilized on transfers of buffers that are large
pi.org] On Behalf
Of Rolf vandeVaart
Sent: Monday, June 18, 2012 11:00 AM
To: Open MPI Users
Cc: Олег Рябков
Subject: Re: [OMPI users] NVCC mpi.h: error: attribute "__deprecated__" does
not take arguments
Hi Dmitry:
Let me look into this.
Rolf
From: users-boun...@open-mpi.org [mailto:users-
Hi Dmitry:
Let me look into this.
Rolf
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf
Of Dmitry N. Mikushin
Sent: Monday, June 18, 2012 10:56 AM
To: Open MPI Users
Cc: Олег Рябков
Subject: Re: [OMPI users] NVCC mpi.h: error: attribute "__deprecated__" does
not ta
You should be running with one GPU per MPI process. If I understand correctly,
you have a 3 node cluster and each node has a GPU so you should run with np=3.
Maybe you can try that and see if your numbers come out better.
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On B
>-Original Message-
>From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
>On Behalf Of Don Armstrong
>Sent: Thursday, May 03, 2012 5:43 PM
>To: us...@open-mpi.org
>Subject: Re: [OMPI users] MPI over tcp
>
>On Thu, 03 May 2012, Rolf vandeVaar
I tried your program on a single node and it worked fine. Yes, TCP message
passing in Open MPI has been working well for some time.
I have a few suggestions.
1. Can you run something like hostname successfully (mpirun -np 10 -hostfile
yourhostfile hostname)
2. If that works, then you can also ru
I am not sure about everything that is going wrong, but there are at least two
issues I found.
First, you are skipping the first line that you read from integers.txt. Maybe
something like this instead.
while(fgets(line, sizeof line, fp)!= NULL){
sscanf(line,"%d",&data[k]);
sum = sum +
Yes, they are supported in the sense that they can work together. However, if
you want to have the ability to send/receive GPU buffers directly via MPI
calls, then I recommend you get CUDA 4.1 and use the Open MPI trunk.
http://www.open-mpi.org/faq/?category=building#build-cuda
Rolf
From: use
Open MPI cannot handle having two interfaces on a node on the same subnet. I
believe it has to do with our matching code when we try to match up a
connection.
The result is a hang as you observe. I also believe it is not good practice to
have two interfaces on the same subnet.
If you put them
, December 14, 2011 10:47 AM
To: Open MPI Users
Cc: Rolf vandeVaart
Subject: Re: [OMPI users] How "CUDA Init prior to MPI_Init" co-exists with
unique GPU for each MPI process?
Hi,
Processes are not spawned by MPI_Init. They are spawned before by some
applications between your mpirun cal
Actually, that is not quite right. From the FAQ:
"This feature currently only exists in the trunk version of the Open MPI
library."
You need to download and use the trunk version for this to work.
http://www.open-mpi.org/nightly/trunk/
Rolf
From: users-boun...@open-mpi.org [mailto:users-boun
>-Original Message-
>From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
>On Behalf Of Chris Cooper
>Sent: Friday, October 14, 2011 1:28 AM
>To: us...@open-mpi.org
>Subject: [OMPI users] gpudirect p2p?
>
>Hi,
>
>Are the recent peer to peer capabilities of cuda leveraged by
>> 1: After a reboot of two nodes I ran again, and the inter-node freeze didn't
>happen until the third iteration. I take that to mean that the basic
>communication works, but that something is saturating. Is there some notion
>of buffer size somewhere in the MPI system that could explain this?
>
Hi Fengguang:
That is odd that you see the problem even when running with the openib flags
set as Brice indicated. Just to be extra sure there are no typo errors in your
flag settings, maybe you can verify with the ompi_info command like this?
ompi_info -mca btl_openib_flags 304 -param btl ope
-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf
Of Brice Goglin
Sent: Monday, February 28, 2011 2:14 PM
To: Open MPI Users
Subject: Re: [OMPI users] anybody tried OMPI with gpudirect?
Le 28/02/2011 19:49, Rolf vandeVaart a écrit :
>
] anybody tried OMPI with gpudirect?
Le 28/02/2011 17:30, Rolf vandeVaart a écrit :
> Hi Brice:
> Yes, I have tired OMPI 1.5 with gpudirect and it worked for me. You
> definitely need the patch or you will see the behavior just as you described,
> a hang. One thing you could try is d
Hi Brice:
Yes, I have tired OMPI 1.5 with gpudirect and it worked for me. You definitely
need the patch or you will see the behavior just as you described, a hang. One
thing you could try is disabling the large message RDMA in OMPI and see if that
works. That can be done by adjusting the openi
Hi James:
I can reproduce the problem on a single node with Open MPI 1.5 and the
trunk. I have submitted a ticket with
the information.
https://svn.open-mpi.org/trac/ompi/ticket/2656
Rolf
On 12/13/10 18:44, James Dinan wrote:
Hi,
I'm getting strange behavior using datatypes in a one-sided
nal flags. I think this should work,
but I am only reading
what is in the ticket.
Rolf
On 11/29/10 16:26, Nehemiah Dacres wrote:
that looks about right. So the suggestion:
./configure LDFLAGS="-notpath ... ... ..."
-notpath should be replaced by whatever the proper flag
This problem looks a lot like a thread from earlier today. Can you look
at this
ticket and see if it helps? It has a workaround documented in it.
https://svn.open-mpi.org/trac/ompi/ticket/2632
Rolf
On 11/29/10 16:13, Prentice Bisbal wrote:
No, it looks like ld is being called with the optio
Ethan:
Can you run just "hostname" successfully? In other words, a non-MPI
program.
If that does not work, then we know the problem is in the runtime. If
it does works, then
there is something with the way the MPI library is setting up its
connections.
Is there more than one interface on
Hi Eloi:
To select the different bcast algorithms, you need to add an extra mca
parameter that tells the library to use dynamic selection.
--mca coll_tuned_use_dynamic_rules 1
One way to make sure you are typing this in correctly is to use it with
ompi_info. Do the following:
ompi_info -mca
/3/25 Rolf Vandevaart <mailto:rolf.vandeva...@sun.com>>
They will automatically be used by the library. There is nothing
special that you need to do. Unfortunately, there is no simple way
to tell if they are being used. I would suggest that you
specifically call th
They will automatically be used by the library. There is nothing
special that you need to do. Unfortunately, there is no simple way to
tell if they are being used. I would suggest that you specifically call
them out in different calls to mpirun to make sure they are both
working. If they bo
On 03/01/10 11:51, Ralph Castain wrote:
On Mar 1, 2010, at 8:41 AM, David Turner wrote:
On 3/1/10 1:51 AM, Ralph Castain wrote:
Which version of OMPI are you using? We know that the 1.2 series was unreliable
about removing the session directories, but 1.3 and above appear to be quite
good ab
Hi:
I assume if you wait several minutes than your program will actually
time out, yes? I guess I have two suggestions. First, can you run a
non-MPI job using the wireless? Something like hostname? Secondly, you
may want to specify the specific interfaces you want it to use on the
two machi
Hi, how exactly do you run this to get this error? I tried and it
worked for me.
burl-ct-x2200-16 50 =>mpirun -mca btl_openib_warn_default_gid_prefix 0
-mca btl self,sm,openib -np 2 -host burl-ct-x2200-16,burl-ct-x2200-17
-mca btl_openib_ib_timeout 16 a.out
I am 0 at 1252670691
I am 1 at 125
Hi Paul:
I tried the running the same way as you did and I saw the same thing. I
was using ClusterTools 8.2 (Open MPI 1.3.3r21324) and running on
Solaris. I looked at the mpirun process and it was definitely consuming
approximately 12 file descriptors per a.out process.
burl-ct-v440-0 59 =
This message is telling you that you have run out of file descriptors.
I am surprised that the -mca parameter setting did not fix the problem.
Can you run limit or ulimit on your shell and send the information? I
typically set my limit to 65536 assuming the system allows it.
burl-16 58 =>limit
I assume it is working with np=8 because the 8 processes are getting
launched on the same node as mpirun and therefore there is no call to
qrsh to start up any remote processes. When you go beyond 8, mpirun
calls qrsh to start up processes on some of the remote nodes.
I would suggest first th
I think what you are looking for is this:
--mca plm_rsh_disable_qrsh 1
This means we will disable the use of qrsh and use rsh or ssh instead.
The --mca pls ^sge does not work anymore for two reasons. First, the
"pls" framework was renamed "plm". Secondly, the gridgengine plm was
folded into
As Lenny said, you should use the if_include parameter. Specifically,
it would look like this depending on which ones you want to select.
-mca btl_openib_if_include mtcha0
or
-mca btl_openib_if_include mtcha1
Rolf
On 07/15/09 09:33, nee...@crlindia.com wrote:
Thanks Ralph,
i foun
Ray Muno wrote:
Rolf Vandevaart wrote:
Ray Muno wrote:
Ray Muno wrote:
We are running a cluster using Rocks 5.0 and OpenMPI 1.2 (primarily).
Scheduling is done through SGE. MPI communication is over InfiniBand.
We also have OpenMPI 1.3 installed and receive
Ray Muno wrote:
Ray Muno wrote:
We are running a cluster using Rocks 5.0 and OpenMPI 1.2 (primarily).
Scheduling is done through SGE. MPI communication is over InfiniBand.
We also have OpenMPI 1.3 installed and receive similar errors.-
This does sound like a problem with SGE. By
1 - 100 of 139 matches
Mail list logo