libcephfs.h
Is it a possible futur enhancement?
Thanks,
Eric
--
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42
cb_alltoall' and worth: 'automatic'
Key is 'ind_rd_buffer_size' and worth: '4194304'
Key is 'ind_wr_buffer_size' and worth: '524288'
Key is 'romio_ds_read' and worth: 'automatic'
Key is 'romio_ds_write' and worth
module ompio
What else can I do to dig into this?
Are there parameters ompio is aware of with GPFS?
Thanks,
Eric
--
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42
On 2022-06-10 16:23, Eric Chamberland via users wrote:
Hi,
I wa
"ompi_info --all" gives me...
I have tried this:
mpiexec --mca io romio321 --mca mca_verbose 1 --mca
mpi_show_mca_params 1 --mca io_base_verbose 1 ...
But I cannot see anything about io coming out...
With "ompi_info" I do...
Is it possible?
Thanks,
Eric
--
ons!
Eric
On 2022-06-01 23:31, Eric Chamberland via users wrote:
Hi,
In the past, we have successfully launched large sized (finite
elements) computations using PARMetis as mesh partitioner.
It was first in 2012 with OpenMPI (v2.?) and secondly in March 2019
with OpenMPI 3.1.2 that we succ
problems test with OMPI-5.0.x?
Regarding the application, at some point it invokes MPI_Alltoallv
sending more than 2GB to some of the ranks (using derived dt), right?
//WBR, Mikhail
*From:* users *On Behalf Of
*Eric Chamberland via users
*Sent:* Thursday, June 2, 2022 5:31
ery specific call, but I am not sure it is
sending 2GB to a specific rank but maybe have 2GB divided between many
rank. The fact is that this part of the code, when it works, does not
create such a bump in memory usage... But I have to dig a bit more...
Regards,
Eric
//WBR, Mikhail
*Fro
quot;minimum reproducer" that bugs, since it
happens only on "large" problems, but I think I could export the data
for a 512 processes reproducer with PARMetis call only...
Thanks for helping,
Eric
--
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42
From: users On Behalf Of Eric Chamberland
via users
Sent: Thursday, September 23, 2021 9:28 AM
To: Open MPI Users
Cc: Eric Chamberland ; Vivien Clauzon
Subject: [OMPI users] Status of pNFS, CephFS and MPI I/O
Hi,
I am looking around for information about parallel filesystems supported for
MPI I
supported?
Thanks,
Eric
--
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42
; on those error codes?
Thanks,
Eric
--
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42
nks,
Eric
Le 2019-09-30 à 3:34 p.m., Eric Chamberland via users a écrit :
Hi,
I am using OpenMPI 3.1.2 with slurm 17.11.12 and it looks like I can't
have the "--output-filename" option taken into account. All my
outputs are going into slurms output files.
Can it be impo
Hi,
I am using OpenMPI 3.1.2 with slurm 17.11.12 and it looks like I can't
have the "--output-filename" option taken into account. All my outputs
are going into slurms output files.
Can it be imposed or ignored by a slurm configuration?
How is it possible to bypass that?
Strangely, the "--
On 25/04/17 04:36 PM, r...@open-mpi.org wrote:
add --oversubscribe to the cmd line
good, it works! :)
Is there an environment variable equivalent to --oversubscribe argument?
I can't find this option in near related FAQ entries, should it be added
here? :
https://www.open-mpi.org/faq/?cat
Oh, forgot something important,
since OpenMPI 1.8.x I am using:
export OMPI_MCA_hwloc_base_binding_policy=none
Also, I am exporting this since 1.6.x?:
export OMPI_MCA_mpi_yield_when_idle=1
Eric
On 25/04/17 04:31 PM, Eric Chamberland wrote:
Ok, here it is:
===
first, with
Ok, here it is:
===
first, with -n 8:
===
mpirun -mca ras_base_verbose 10 --display-allocation -n 8 echo "Hello"
[zorg:22429] [[INVALID],INVALID] plm:rsh_lookup on agent ssh : rsh path NULL
[zorg:22429] plm:base:set_hnp_name: initial bias 22429 nodename hash
810
ed by excluding the localhost RAS component by specifying
# the value "^localhost" [without the quotes] to the "ras" MCA
# parameter).
(15:53:52) [zorg]:~>
Thanks!
Eric
On 25/04/17 03:52 PM, r...@open-mpi.org wrote:
What is in your hostfile?
On Apr 25, 2017, at 11:39 AM, Eri
Hi,
just testing the 3.x branch... I launch:
mpirun -n 8 echo "hello"
and I get:
--
There are not enough slots available in the system to satisfy the 8 slots
that were requested by the application:
echo
Either request f
Excellent!
I will put all in place, then try both URLs and see which one is
"manageable" for me!
Thanks,
Eric
On 22/06/16 02:10 PM, Jeff Squyres (jsquyres) wrote:
On Jun 22, 2016, at 2:06 PM, Eric Chamberland
wrote:
We have a similar mechanism already (that is used by th
On 22/06/16 01:49 PM, Jeff Squyres (jsquyres) wrote:
We have a similar mechanism already (that is used by the Open MPI community for nightly
regression testing), but with the advantage that it will give you a unique download
filename (vs. "openmpi-v2.x-latest.bz2" every night). Do this:
wget
Hi,
I would like to do compile+test our code each night with the "latest"
openmpi v2 release (or nightly if enough stable).
Just to ease the process, I would like to "wget" the latest archive with
a "permanent" link...
Is it feasible for you to just put a symlink or something like it so I
Le 2015-12-17 12:45, Jeff Squyres (jsquyres) a écrit :
On Dec 17, 2015, at 8:57 AM, Eric Chamberland
wrote:
But I would like to know if the MPI I am using is able to do message
progression or not: So how do an end-user like me can knows that? Does-it rely
on hardware? Is there a #define
Hi Gilles,
Le 2015-10-21 20:31, Gilles Gouaillardet a écrit :
#3 difficult question ...
first, keep in mind there is currently no progress thread in Open MPI.
that means messages can be received only when MPI_Wait* or MPI_Test* is
invoked. you might hope messages are received when doing some
com
Hi Gilles and Josh,
I think my reply apply to both of your answers which I thank you for.
On 21/10/15 08:31 PM, Gilles Gouaillardet wrote:
Eric,
#2 maybe not ...
a tree based approach has O(log(n)) scaling
(compared to O(n) scaling with your linear method.
so at scale, MPI_Igather will hopeful
Hi,
A long time ago (in 2002) we programmed here a non-blocking MPI_Igather
with equivalent calls to MPI_Isend/MPI_Irecv (see the 2 attached files).
A very convenient advantage of this version, is that I can do some work
on the root process as soon as it start receiving data... Then, it wait
application.
See:
http://blogs.cisco.com/performance/the-vader-shared-memory-transport-in-open-mpi-now-featuring-3-flavors-of-zero-copy
-Nathan
On Thu, Feb 19, 2015 at 03:32:43PM -0500, Eric Chamberland wrote:
On 02/19/2015 02:58 PM, Nathan Hjelm wrote:
On Thu, Feb 19, 2015 at 12:16:49PM -0500
On 02/19/2015 03:53 PM, Nathan Hjelm wrote:
Great! I will add an MCA variable to force CMA and also enable it if 1)
no yama and 2) no PR_SET_PTRACER.
cool, thanks again!
You might also look at using xpmem. You can find a version that supports
3.x @ https://github.com/hjelmn/xpmem . It is a
On 02/19/2015 02:58 PM, Nathan Hjelm wrote:
On Thu, Feb 19, 2015 at 12:16:49PM -0500, Eric Chamberland wrote:
On 02/19/2015 11:56 AM, Nathan Hjelm wrote:
If you have yama installed you can try:
Nope, I do not have it installed... is it absolutely necessary? (and would
it change something
e
to pass any options to "mpicc" when compiling/linking an mpi application
to use cma?
Thanks,
Eric
echo 1 > /proc/sys/kernel/yama/ptrace_scope
as root.
-Nathan
On Thu, Feb 19, 2015 at 11:06:09AM -0500, Eric Chamberland wrote:
By the way,
I have tried two others things:
#1
écrit :
I recommend using vader for CMA. It has code to get around the ptrace
setting. Run with mca_btl_vader_single_copy_mechanism cma (should be the
default).
Ok, I tried it, but it gives exactly the same error message!
Eric
-Nathan
On Wed, Feb 18, 2015 at 02:56:01PM -0500, Eric Chamberland
Hi,
I have configured with "--with-cma" on 2 differents OS (RedHat 6.6 and
OpenSuse 12.3), but in both case, I have the following error when
launching a simple mpi_hello_world.c example:
/opt/openmpi-1.8.4_cma/bin/mpiexec --mca btl_sm_use_cma 1 -np 2 /tmp/hw
--
On 01/14/2015 05:57 PM, Rob Latham wrote:
On 12/17/2014 07:04 PM, Eric Chamberland wrote:
Hi!
Here is a "poor man's fix" that works for me (the idea is not from me,
thanks to Thomas H.):
#1- char* lCwd = getcwd(0,0);
#2- chdir(lPathToFile);
#3- MPI_File_open(...,lFileNameWit
since romio is currently imported
from mpich.
Cheers,
Gilles
On 2014/12/16 0:16, Eric Chamberland wrote:
Hi Gilles,
just created a very simple test case!
with this setup, you will see the bug with valgrind:
export
too_long=./this/is/a_very/long/path/that/contains/a/not/so/long/filename/b
Hi Gilles,
here is a simple setup to have valgrind caomplains now:
export
too_long=./this/is/a_very/long/path/that/contains/a/not/so/long/filename/but/trying/to/collectively/mpi_file_open/it/you/will/have/a/memory/corruption/resulting/of/invalide/writing/or/reading/past/the/end/of/one/or/some/h
takes between 1 and 10 characters
could you please give this patch a try and let us know the results ?
Cheers,
Gilles
On 2014/12/15 11:52, Eric Chamberland wrote:
Hi again,
some new hints that might help:
1- With valgrind : If I run the same test case, same data, but
moved to a shorter
ed in some manner... at least an
error message!...
Thanks,
Eric
i am now checking if the overflow is correctly detected (that could
explain the one byte overflow reported by valgrind)
Cheers,
Gilles
On 2014/12/15 11:52, Eric Chamberland wrote:
Hi again,
some new hints that might help
h could be using shared memory on the same node?
Thanks,
Eric
On 12/14/2014 02:06 PM, Eric Chamberland wrote:
Hi,
I finally (thanks for fixing oversubscribing) tested with 1.8.4rc3 for
my problem with collective MPI I/O.
A problem still there. In this 2 processes example, process rank 1
dies wit
Hi Gilles,
On 12/14/2014 09:20 PM, Gilles Gouaillardet wrote:
Eric,
can you make your test case (source + input file + howto) available so i
can try to reproduce and fix this ?
I would like to, but the complete app is big (and not public), is on top
of PETSc with mkl, and in C++... :-(
I can
Hi,
I finally (thanks for fixing oversubscribing) tested with 1.8.4rc3 for
my problem with collective MPI I/O.
A problem still there. In this 2 processes example, process rank 1 dies
with segfault while process rank 0 wait indefinitely...
Running with valgrind, I found these errors which m
On 12/10/2014 12:55 PM, Ralph Castain wrote:
Tarball now available on web site
http://www.open-mpi.org/nightly/v1.8/
I’ll run the tarball generator now so you can try the nightly tarball.
ok, retrieved openmpi-v1.8.3-236-ga21cb20 and it compiled, linked, and
executed nicely when oversubscri
On 12/10/2014 12:55 PM, Ralph Castain wrote:
Tarball now available on web site
http://www.open-mpi.org/nightly/v1.8/
_
_
_
_
On Dec 10, 2014, at 9:40 AM, Ralph Castain mailto:r...@open-mpi.org>> wrote:
I’ll run the tarball generator now so you can try the nightly tarball.
ok, retrieved open
On 12/10/2014 10:40 AM, Ralph Castain wrote:
You should be able to apply the patch - I don’t think that section of
code differs from what is in the 1.8 repo.
it compiles, link, but gives me a segmentation violation now:
#0 0x7f1827b00e91 in mca_allocator_component_lookup () from
/opt/ope
l try to dig into that notion a bit.
On Dec 9, 2014, at 10:39 AM, Eric Chamberland
wrote:
Hi again,
I sorted and "seded" (cat outpout.1.00 |sed 's/default/default value/g'|sed
's/true/1/g' |sed 's/false/0/g') the output.1.00 file from:
mpirun --ou
On 12/09/2014 04:19 PM, Nathan Hjelm wrote:
yield when idle is broken on 1.8. Fixing now.
ok, thanks a lot! will wait for the fix!
Eric
launch with 165 vs 183.
The diff may be interesting but I can't interpret everything that is
written...
The files are attached...
Thanks,
Eric
On 12/09/2014 01:02 PM, Eric Chamberland wrote:
On 12/09/2014 12:24 PM, Ralph Castain wrote:
Can you provide an example cmd line you use to l
:
"mpirun -np 32 myprog"
Maybe the result of "-mca mpi_show_mca_params all" would be insightful?
Eric
On Dec 9, 2014, at 9:14 AM, Eric Chamberland
wrote:
Hi,
we were used to do oversubscribing just to do code validation in nightly
automated parallel runs of our co
Hi,
we were used to do oversubscribing just to do code validation in nightly
automated parallel runs of our code.
I just compiled openmpi 1.8.3 and launched the whole suit of
sequential/parallel tests and noticed a *major* slowdown in
oversubscribed parallel tests with 1.8.3 compared to 1.6.
Hi,
I have random segmentation violations (signal 11) in the mentioned
function when testing MPI I/O calls with 2 processes on a single
machine. Most of the time (1499/1500), it works perfectly.
here are the call stacks (for 1.6.3) on processes:
process 0:
==
Hi Ralph,
some new information about this "bug": we got a defective disk on this
computer! Then filesystem errors occurred... The disk is now replaced
since 2 days and everything seems to work well (the problem re-occurred
since the last time I wrote about it).
Sorry for bothering!
Eric
Hi Ralph,
On 02/03/2014 04:20 PM, Ralph Castain wrote:
On Feb 3, 2014, at 1:13 PM, Eric Chamberland
wrote:
On 02/03/2014 03:59 PM, Ralph Castain wrote:
Very strange - even if you kill the job with SIGTERM, or have processes that
segfault, OMPI should clean itself up and remove those
promise would be to have the error message to tell there
is a file with the same name of the directory chosen?
Or add a new entry to the FAQ to help users find the workaround you
proposed... ;-)
thanks again!
Eric
HTH
Ralph
On Feb 3, 2014, at 12:31 PM, Eric Chamberland
wrote:
Hi,
On 02
hope this is the information you wanted... Is it?
Thanks,
Eric
On Feb 3, 2014, at 12:00 PM, Eric Chamberland
wrote:
On 02/03/2014 02:49 PM, Ralph Castain wrote:
Seems rather odd - is your /tmp by any chance network mounted?
No it is a "normal" /tmp:
"cd /tmp; df -h .&qu
On 02/03/2014 02:49 PM, Ralph Castain wrote:
Seems rather odd - is your /tmp by any chance network mounted?
No it is a "normal" /tmp:
"cd /tmp; df -h ." gives:
Filesystem Size Used Avail Use% Mounted on
/dev/sda149G 17G 30G 37% /
And there is plenty of disk space...
I agr
Hi,
with OpenMPI 1.6.3 I have encountered this error which "randomly" appears:
[compile:20089] opal_os_dirpath_create: Error: Unable to create the
sub-directory (/tmp/openmpi-sessions-cmpbib@compile_0/55528/0) of
(/tmp/openmpi-sessions-cmpbib@compile_0/55528/0/0), mkdir failed [1]
[compile:200
Hi,
I just open a new "chapter" with the same subject. ;-)
We are using OpenMPI 1.6.5 (compiled with --enable-thread-multiple) with
Petsc 3.4.3 (on colosse supercomputer:
http://www.calculquebec.ca/en/resources/compute-servers/colosse). We
observed a deadlock with threads within the openib bt
On 05/22/2013 12:37 PM, Ralph Castain wrote:
Well, ROMIO was written by Argonne/MPICH (unfair to point the finger solely at
Rob) and picked up by pretty much everyone. The issue isn't a bug in MPIIO, but
rather
Ok, sorry about that!
thanks for the historical and technical informations!
Eric
for all distributions...
It has been written by Rob Latham.
Maybe some developers could confirm this?
Eric
T. Rosmond
On Wed, 2013-05-22 at 11:21 -0400, Eric Chamberland wrote:
I have experienced the same problem.. and worst, I have discovered a bug
in MPI I/O...
look here:
http
I have experienced the same problem.. and worst, I have discovered a bug
in MPI I/O...
look here:
http://trac.mpich.org/projects/mpich/ticket/1742
and here:
http://www.open-mpi.org/community/lists/users/2012/10/20511.php
Eric
On 05/21/2013 03:18 PM, Tom Rosmond wrote:
Hello:
A colleague an
Hi,
I have a problem receiving a vector of a MPI_datatype constructed via
MPI_Type_create_struct.
It looks like MPI_Send or MPI_Recv doesn't works as expected: some parts
of a nested struct in the received buffer are not filled at all!.
I tested the code under mpich 3.0.3 and it worked perf
le-g=yes
So, is this a wrong "assert" in openmpi?
Is there a real problem to use this code in a "release" mode?
Thanks,
Eric
On 04/05/2013 12:57 PM, Eric Chamberland wrote:
Hi all,
I have a well working (large) code which is using openmpi 1.6.3 (see
config.log h
Hi all,
I have a well working (large) code which is using openmpi 1.6.3 (see
config.log here:
http://www.giref.ulaval.ca/~ericc/bug_openmpi/config.log_nodebug)
(I have used it for reading with MPI I/O with success over 1500 procs
with very large files)
However, when I use openmpi compiled
On 01/21/2013 01:00 PM, Reuti wrote:
although you can create such files in Linux, it's not portable.
http://en.wikipedia.org/wiki/Filename (Reserved characters and words)
Best is to use only characters from POSIX portable character set for filenames. Especially as this
syntax with a colon is u
Hi,
If you try to open a file with a ":" in the filename (ex: "file:o"), you
get an MPI_ERR_NO_SUCH_FILE.
ERROR Returned by MPI: 42
ERROR_string Returned by MPI: MPI_ERR_NO_SUCH_FILE: no such file or
directory
Just launch the simple test code attached to see the problem.
MPICH has the same
the right direction, but I am not an "expert"...
some expert advice should be welcome.
Eric
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
On Dec 3, 2012, at 7:12 PM, Eric Chamberland wrote:
Le 12/03/2012 05:37 PM, Brock Palen a écri
Le 12/03/2012 05:37 PM, Brock Palen a écrit :
I was trying to use hints with ROMIO and lustre prompted by another post on
this list.
I have a simple MPI-IO code and I cannot using the notes I find set the lustre
striping using the config file and setting ROMIO_HINTS.
Question:
How can I chec
On 12/03/2012 03:23 AM, pascal.dev...@bull.net wrote:
try with:
striping_unit 1048576
striping_factor 16
(stripe_size means striping_unit and stripe_count means striping_factor)
Shame on me! ;-)
Thank you, it works perfectly now!...
Eric
Hi,
I am using openmpi 1.6.3 with lustre. I can change the stip count via
"striping_unit" but if I try to change the stripe size via
"striping_factor", all my options are ignored and fall back on the
default values.
Here is what I do:
=
setenv ROMIO_HINTS /home/
http://www.open-mpi.org/community/lists/users/2010/11/14816.php
Also, not MPI but C.
I wonder if you need to declare "size" as 'long int',
or maybe 'long long int', to represent/hold correctly
the large value that you want
(360,000,000,000 > 2,147,483,647).
I hope th
Hi,
I get this error when trying to write 360 000 000 000 MPI_LONG:
with Openmpi-1.4.5:
ERROR Returned by MPI_File_write_all: 35
ERROR_string Returned by MPI_File_write_all: MPI_ERR_IO: input/output error
with Openmpi-1.6.2:
ERROR Returned by MPI_File_write_all: 13
ERROR_string Returned by MPI_
Le 2012-03-09 11:16, Jeffrey Squyres a écrit :
Sorry for the delay. Answers inline.
No problem, thank you for taking the time to read the long example...
#4- MPI_WAIT_ANY_VERSION received always the data from processes on the same
host.
I'm not sure what you mean by that statement.
Sorry,
Hi,
I would like to know which of "waitone" vs "waitany" is optimal and of
course, will never produce deadlocks.
Let's say we have "lNp" processes and they want to send an array of int
of length "lNbInt" to process "0" in a non-blocking MPI_Isend (instead
of MPI_Gather). Let's say the order
71 matches
Mail list logo