Hi,
I was compiling OpenMPI 1.8.1 today and I noticed that pretty much every
single scheduler has its support enabled by default at configure (except
the one I need, which is Torque). Is there a reason for that ? Why not
have a single scheduler enabled and require to specify it at configure
ti
at it is compiling support for many schedulers while I'm
rather convinced that very few sites actually use multiple schedulers at
the same time.
Maxime
Le 2014-05-14 16:51, Gus Correa a écrit :
On 05/14/2014 04:25 PM, Maxime Boissonneault wrote:
Hi,
I was compiling OpenMPI 1.8.1 t
build them. So we do.
In general, OMPI builds support for everything that it can find on the rationale that a) we can't
know ahead of time exactly what people want, and b) most people want to just "./configure
&& make -j 32 install" and be done with it -- so build as much as p
ion is sane and reasonable. Just in case
you're taking ha'penny's worth from the groundlings. I think I would
prefer not to have capability included that we won't use.
-- bennet
On Wed, May 14, 2014 at 7:43 PM, Maxime Boissonneault
wrote:
For the scheduler issue, I would be h
st of
things to build, so any work toward that scheme might not be lost.
-- bennet
On Thu, May 15, 2014 at 7:41 AM, Maxime Boissonneault
wrote:
Le 2014-05-15 06:29, Jeff Squyres (jsquyres) a écrit :
I think Ralph's email summed it up pretty well -- we unfortunately have
(at least) tw
Please allow me to chip in my $0.02 and suggest to not reinvent the
wheel, but instead consider to migrate the build system to cmake :
http://www.cmake.org/
I agree that menu-wise, CMake does a pretty good job with ccmake, and is
much, much easier to create than autoconf/automake/m4 stuff (
Le 2014-05-15 18:27, Jeff Squyres (jsquyres) a écrit :
On May 15, 2014, at 6:14 PM, Fabricio Cannini wrote:
Alright, but now I'm curious as to why you decided against it.
Could please elaborate on it a bit ?
OMPI has a long, deep history with the GNU Autotools. It's a very long,
complicated
Le 2014-05-16 09:06, Jeff Squyres (jsquyres) a écrit :
On May 15, 2014, at 8:00 PM, Fabricio Cannini wrote:
Nobody is disagreeing that one could find a way to make CMake work - all we are
saying is that (a) CMake has issues too, just like autotools, and (b) we have
yet to see a compelling re
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
-
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique
pi/openmpi/1.8.1_gcc4.8_cuda6.0.37/share/openmpi/mca-coll-ml.config",
data source: default, level: 9 dev/all, type: string)
MCA io: informational
"io_romio_complete_configure_params" (current value:
"--with-file-system=nfs+lustre FROM_OMPI=yes
CC='/software6/compilers/gcc/4.8/bin/gcc -std=gnu99' CFLAGS='-O3
-DNDEBUG -finline-functions -fno-strict-aliasing -pthread' CPPFLAGS='
-I/software-gpu/src/openmpi-1.8.1/opal/mca/hwloc/hwloc172/hwloc/include
-I/software-gpu/src/openmpi-1.8.1/opal/mca/event/libevent2021/libevent
-I/software-gpu/src/openmpi-1.8.1/opal/mca/event/libevent2021/libevent/include'
FFLAGS='' LDFLAGS=' ' --enable-shared --enable-static
--with-file-system=nfs+lustre
--prefix=/software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37
--disable-aio", data source: default, level: 9 dev/all, type: string)
[login-gpu01.calculquebec.ca:11486] mca: base: close: unloading component Q
--
-
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique
have the Mellanox Ofed. We use the Linux RDMA from
CentOS 6.5. However, should that completely disable GDR within a single
node ? i.e. does GDR _have_ to go through IB ? I would assume that our
lack of Mellanox OFED would result in no-GDR inter-node, but GDR intra-node.
Thanks
--
--
Hi,
I've been following this thread because it may be relevant to our setup.
Is there a drawback of having orte_hetero_nodes=1 as default MCA
parameter ? Is there a reason why the most generic case is not assumed ?
Maxime Boissonneault
Le 2014-06-20 13:48, Ralph Castain a écrit :
What are your threading options for OpenMPI (when it was built) ?
I have seen OpenIB BTL completely lock when some level of threading is
enabled before.
Maxime Boissonneault
Le 2014-06-24 18:18, Fischer, Greg A. a écrit :
Hello openmpi-users,
A few weeks ago, I posted to the list about
HCA:
[binf102:fischega] $ /usr/sbin/ibstat
CA 'mlx4_0'
CA type: MT26428
Command line (path and LD_LIBRARY_PATH are set correctly):
mpirun -x LD_LIBRARY_PATH -mca btl openib,sm,self -mca
btl_openib_verbose 1 -np 31 $CTF_EXEC
*From:*users [mailto:users-boun...@open-mpi.org] *
s much outweight the changes
that need to be made to get them.
My 2 cents,
Maxime Boissonneault
My two cents of opinion
Gus Correa
On 08/05/2014 12:54 PM, Ralph Castain wrote:
Check the repo - hasn't been touched in a very long time
On Aug 5, 2014, at 9:42 AM, Fabricio Cannini wro
_SUCCESS
--
What is weird is that this same command works for other users, on the
same node.
Anyone know what might be going on here ?
Thanks,
--
-----
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique
,
--
-
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique
Note that if I do the same build with OpenMPI 1.6.5, it works flawlessly.
Maxime
Le 2014-08-14 08:39, Maxime Boissonneault a écrit :
Hi,
I compiled Charm++ 6.6.0rc3 using
./build charm++ mpi-linux-x86_64 smp --with-production
When compiling the simple example
mpi-linux-x86_64-smp/tests/charm
ar Fabian Mojica Ladino
Geologist M.S. in Geophysics
___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/25016.php
--
-
/software/ompi/v1.8/
On Aug 14, 2014, at 8:39 AM, Maxime Boissonneault
wrote:
Hi,
I compiled Charm++ 6.6.0rc3 using
./build charm++ mpi-linux-x86_64 smp --with-production
When compiling the simple example
mpi-linux-x86_64-smp/tests/charm++/simplearrayhello/
I get a segmentation fault that
Hi,
I ran gromacs successfully with OpenMPI 1.8.1 and Cuda 6.0.37 on a
single node, with 8 ranks and multiple OpenMP threads.
Maxime
Le 2014-08-14 14:15, Joshua Ladd a écrit :
Hi, Maxime
Just curious, are you able to run a vanilla MPI program? Can you try
one one of the example programs in
is coming from the OpenIB BTL,
would be good to check this.
Do you know what the MPI thread level is set to when used with the
Charm++ runtime? Is it MPI_THREAD_MULTIPLE? The OpenIB BTL is not
thread safe.
Josh
On Thu, Aug 14, 2014 at 2:17 PM, Maxime Boissonneault
<mailto:maxi
BTL, would be good to check this.
Do you know what the MPI thread level is set to when used with
the Charm++ runtime? Is it MPI_THREAD_MULTIPLE? The OpenIB BTL is
not thread safe.
Josh
On Thu, Aug 14, 2014 at 2:17 PM, Maxime Boissonneault
mailto:maxime.boissonnea...@ca
15:16, Joshua Ladd a écrit :
Can you try to run the example code "ring_c" across nodes?
Josh
On Thu, Aug 14, 2014 at 3:14 PM, Maxime Boissonneault
<mailto:maxime.boissonnea...@calculquebec.ca>> wrote:
Yes,
Everything has been built with GCC 4.8.x, although x mig
will recompile it from scratch and provide all the information
requested on the help webpage.
Cheers,
Maxime
Le 2014-08-15 11:58, Maxime Boissonneault a écrit :
Hi Josh,
The ring_c example does not work on our login node :
[mboisson@helios-login1 examples]$ mpiexec -np 10 ring_c
[mboisson
ote:
And maybe include your LD_LIBRARY_PATH
Josh
On Thu, Aug 14, 2014 at 3:16 PM, Joshua Ladd mailto:jladd.m...@gmail.com>> wrote:
Can you try to run the example code "ring_c" across nodes?
Josh
On Thu, Aug 14, 2014 at
login
node if I understood you correctly.
Josh
On Fri, Aug 15, 2014 at 5:20 PM, Maxime Boissonneault
<mailto:maxime.boissonnea...@calculquebec.ca>> wrote:
Here are the requested files.
In the archive, you will find the output of configure, make, make
install as we
Hi Jeff,
Le 2014-08-15 17:50, Jeff Squyres (jsquyres) a écrit :
On Aug 15, 2014, at 5:39 PM, Maxime Boissonneault
wrote:
Correct.
Can it be because torque (pbs_mom) is not running on the head node and mpiexec
attempts to contact it ?
Not for Open MPI's mpiexec, no.
Open MPI'
?
On Aug 15, 2014, at 6:47 PM, Maxime Boissonneault
wrote:
Hi Jeff,
Le 2014-08-15 17:50, Jeff Squyres (jsquyres) a écrit :
On Aug 15, 2014, at 5:39 PM, Maxime Boissonneault
wrote:
Correct.
Can it be because torque (pbs_mom) is not running on the head node and mpiexec
attempts to contact
cho $?
65
Maxime
Le 2014-08-16 06:22, Jeff Squyres (jsquyres) a écrit :
Just out of curiosity, I saw that one of the segv stack traces involved the
cuda stack.
Can you try a build without CUDA and see if that resolves the problem?
On Aug 15, 2014, at 6:47 PM, Maxime Boissonneault
wro
Here it is
Le 2014-08-18 12:30, Joshua Ladd a écrit :
mpirun -np 4 --mca plm_base_verbose 10
[mboisson@helios-login1 examples]$ mpirun -np 4 --mca plm_base_verbose
10 ring_c
[helios-login1:27853] mca: base: components_register: registering plm
components
[helios-login1:27853] mca: base: compone
one node, yes?
Try adding the following:
-mca odls_base_verbose 5 -mca state_base_verbose 5 -mca errmgr_base_verbose 5
Lot of garbage, but should tell us what is going on.
On Aug 18, 2014, at 9:36 AM, Maxime Boissonneault
wrote:
Here it is
Le 2014-08-18 12:30, Joshua Ladd a écrit :
mpirun -
Here it is.
Maxime
Le 2014-08-18 12:59, Ralph Castain a écrit :
Ah...now that showed the problem. To pinpoint it better, please add
-mca oob_base_verbose 10
and I think we'll have it
On Aug 18, 2014, at 9:54 AM, Maxime Boissonneault
wrote:
This is all one one node indeed.
Attach
param would be the solution.
On Aug 18, 2014, at 10:04 AM, Maxime Boissonneault
wrote:
Here it is.
Maxime
Le 2014-08-18 12:59, Ralph Castain a écrit :
Ah...now that showed the problem. To pinpoint it better, please add
-mca oob_base_verbose 10
and I think we'll have it
On Aug 18, 201
need to get
fixed. We haven't had many cases where it's been an issue, but a couple like
this have cropped up - enough that I need to set aside some time to fix it.
My apologies for the problem.
On Aug 18, 2014, at 10:31 AM, Maxime Boissonneault
wrote:
Indeed, that makes sense now.
would ask here if somebody has a clue of what might be going on. I have
yet to be able to fill a bug report on NVidia's website for Cuda.
Thanks,
--
-
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique
other node)
Maxime
Le 2014-08-18 16:52, Alex A. Granovsky a écrit :
Try the following:
export MALLOC_CHECK_=1
and then run it again
Kind regards,
Alex Granovsky
-Original Message- From: Maxime Boissonneault
Sent: Tuesday, August 19, 2014 12:23 AM
To: Open MPI Users
Subject: [OMPI
trace?
-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime
Boissonneault
Sent: Monday, August 18, 2014 4:23 PM
To: Open MPI Users
Subject: [OMPI users] Segfault with MPI + Cuda on multiple nodes
Hi,
Since my previous thread (Segmentation fault in OpenMPI
. Granovsky a écrit :
Also you need to check return code from cudaMalloc before calling
cudaFree -
the pointer may be invalid as you did not initialized cuda properly.
Alex
-Original Message- From: Maxime Boissonneault
Sent: Tuesday, August 19, 2014 2:19 AM
To: Open MPI Users
Subject: Re
help reduce the scope of the problem, can you retest with a non
CUDA-aware Open MPI 1.8.1? And if possible, use --enable-debug in the
configure line to help with the stack trace?
-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime
Boissonneault
Sent
-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime
Boissonneault
Sent: Tuesday, August 19, 2014 8:55 AM
To: Open MPI Users
Subject: Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes
Hi,
I recompiled OMPI 1.8.1 without Cuda and with debug, but it did not
very friendly way to handle that error.
-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime
Boissonneault
Sent: Tuesday, August 19, 2014 10:39 AM
To: Open MPI Users
Subject: Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes
Hi,
I believe I found
Hi,
Would you say that softwares compiled using OpenMPI 1.8.1 need to be
recompiled using OpenMPI 1.8.2rc4 to work properly ?
Maxime
. Remember that MPI ranks begin
with 0, not 1.
Please correct the cmd line and try again.
How can I debug that ?
Thanks,
--
-
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique
It looks like
-npersocket 1
cannot be used alone. If I do
mpiexec -npernode 2 -npersocket 1 ls -la
then I get no error message.
Is this expected behavior ?
Maxime
Le 2014-08-29 11:53, Maxime Boissonneault a écrit :
Hi,
I am having a weird error with OpenMPI 1.6.3. I run a non-MPI command
x27;t - but we aren't really maintaining the 1.6 series any
more. You might try updating to 1.6.5 and see if it remains there
On Aug 29, 2014, at 9:12 AM, Maxime Boissonneault
<mailto:maxime.boissonnea...@calculquebec.ca>> wrote:
It looks like
-npersocket 1
cannot be used alo
Hi,
You need to compile OpenMPI with --enable-mpi-thread-multiple. However,
OpenMPI used to have problem with that level of threading. Is that still
the case in the 1.8 series ? I know in 1.6 series, that was a no go. It
caused all sorts of hanging in the openib BTL.
If the problems are not s
Hi,
Just an idea here. Do you use cpusets within Torque ? Did you request
enough cores to torque ?
Maxime Boissonneault
Le 2014-09-23 13:53, Brock Palen a écrit :
I found a fun head scratcher, with openmpi 1.8.2 with torque 5 built with TM
support, on hereto core layouts I get the fun
open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/09/25379.php
--
---------
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique
?
Attached is the output of my configure and make lines.
Thanks,
--
-
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique
config-make.log.tar.gz
Description: GNU Zip compressed data
I figured it out. It seems like setting CPP to pgprepro isn't the right
variable.
Thanks,
Maxime
Le 2014-10-03 10:39, Maxime Boissonneault a écrit :
Hi,
I am trying to compile OpenMPI 1.8.3 with PGI 14.9 I am getting a
severe errors here :
1956 PGC-S-0039-Use of undeclared var
would reach tens of thousands and would completely overload our
lustre filesystem. Moreover, with 15MB/s per node, the checkpointing
process would take hours.
How can I improve on that ? Is there an MCA setting that I am missing ?
Thanks,
--
-
Maxime
was done since the last checkpoint.
HTH
Ralph
On Jan 28, 2013, at 7:47 AM, Maxime Boissonneault
wrote:
Hello,
I am doing checkpointing tests (with BLCR) with an MPI application compiled
with OpenMPI 1.6.3, and I am seeing behaviors that are quite strange.
First, some details about the tests
Le 2013-01-28 12:46, Ralph Castain a écrit :
On Jan 28, 2013, at 8:25 AM, Maxime Boissonneault
wrote:
Hello Ralph,
I agree that ideally, someone would implement checkpointing in the application
itself, but that is not always possible (commercial applications, use of
complicated libraries
Le 2013-01-28 13:15, Ralph Castain a écrit :
On Jan 28, 2013, at 9:52 AM, Maxime Boissonneault
wrote:
Le 2013-01-28 12:46, Ralph Castain a écrit :
On Jan 28, 2013, at 8:25 AM, Maxime Boissonneault
wrote:
Hello Ralph,
I agree that ideally, someone would implement checkpointing in the
s. Have you run some parallel
benchmarks on your cluster ?
George.
PS: You can some MPI I/O benchmarks at
http://www.mcs.anl.gov/~thakur/pio-benchmarks.html
On Mon, Jan 28, 2013 at 2:04 PM, Ralph Castain wrote:
On Jan 28, 2013, at 10:53 AM, Maxime Boissonneault
wrote:
Le 2013-01-28 13:15
Le 2013-01-29 21:02, Ralph Castain a écrit :
On Jan 28, 2013, at 10:53 AM, Maxime Boissonneault
<mailto:maxime.boissonnea...@calculquebec.ca>> wrote:
While our filesystem and management nodes are on UPS, our compute
nodes are not. With one average generic (power/cooling mostly)
from rank 0 followed by MPI_Isend to rank 0
In this case also, rank n's MPI_Isend executes quasi-instantaneously,
and rank 0's MPI_Recv only returns a few minutes later.
Thanks,
Maxime Boissonneault
following the MPI_Recv delays the
acknowledgement ? If so, is it supposed to be this way, or is it normal,
and why isn't the same behavior observed with the tcp btl ?
Maxime Boissonneault
Le 2013-02-14 11:50, Maxime Boissonneault a écrit :
Hi,
I have a strange case here. The application is &
quite low.
The fact that mvapich2 does not show this behavior points out to a
problem with the openib btl within openmpi, and not with our setup.
Can anyone try to reproduce this on a different machine ?
Thanks,
Maxime Boissonneault
Le 2013-02-15 14:29, Maxime Boissonneault a écrit :
Hi agai
config.log, environment
variables on the execution node, both make.out, sample code and output
etc.
Thanks,
Jeff
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
-
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique
s, agents
or representatives.
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
---------
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique
Hi,
Do you have thread multiples enabled in your OpenMPI installation ?
Maxime Boissonneault
Le 2013-12-16 17:40, Noam Bernstein a écrit :
Has anyone tried to use openmpi 1.7.3 with the latest CentOS kernel
(well, nearly latest: 2.6.32-431.el6.x86_64), and especially with infiniband?
I
I heard that c/r support in OpenMPI was being dropped after version
1.6.x. Is this not still the case ?
Maxime Boissonneault
Le 2014-02-27 13:09, George Bosilca a écrit :
Both were supported at some point. I'm not sure if any is still in a
workable state in the trunk today. However, the
7.php
___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/04/26690.php
--
-
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Unive
Le 2015-04-13 09:54, Ralph Castain a écrit :
On Apr 13, 2015, at 6:52 AM, Maxime Boissonneault
wrote:
Just out of curiosity... how will OpenMPI start processes under different
accounts ? Through SSH while specifying different user names ?
I am assuming that no resource manager or scheduler
bindings had been similar to Boost MPI, they would
probably have been adopted more widely and may still be alive.
--
-
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Président - Comité de coordination du soutien à la recherche de Calcul
67 matches
Mail list logo