What file system are you running your code on ? And is the same directory
shared across all nodes? I have seen this error if users try to use a
non-shared directory for MPI I/O operations ( e.g. /tmp which is a different
drive/folder on each node).
Thanks
Edgar
-Original Message-
From
collective I/O for
example).
-Original Message-
From: users On Behalf Of Gabriel, Edgar via
users
Sent: Thursday, September 23, 2021 5:31 PM
To: Eric Chamberland ; Open MPI Users
Cc: Gabriel, Edgar ; Louis Poirel
; Vivien Clauzon
Subject: Re: [OMPI users] Status of pNFS, CephFS and MPI I/O
-Original Message-
From: Eric Chamberland
Thanks for your answer Edgard!
In fact, we are able to use NFS and certainly any POSIX file system on a single
node basis.
I should have been asking for: What are the supported file systems for
*multiple nodes* read/write access to files?
->
Eric,
generally speaking, ompio should be able to operate correctly on all file
systems that have support for POSIX functions. The generic ufs component is
for example being used on BeeGFS parallel file systems without problems, we
are using that on a daily basis. For GPFS, the only reason we
work on an update of the FAQ section.
-Original Message-
From: users On Behalf Of Dave Love via users
Sent: Monday, January 18, 2021 11:14 AM
To: Gabriel, Edgar via users
Cc: Dave Love
Subject: Re: [OMPI users] 4.1 mpi-io test failures on lustre
"Gabriel, Edgar via users&quo
I would like to correct one of my statements:
-Original Message-
From: users On Behalf Of Gabriel, Edgar via
users
Sent: Friday, January 15, 2021 7:58 AM
To: Open MPI Users
Cc: Gabriel, Edgar
Subject: Re: [OMPI users] 4.1 mpi-io test failures on lustre
> The entire infrastructure
-Original Message-
From: users On Behalf Of Dave Love via users
Sent: Friday, January 15, 2021 4:48 AM
To: Gabriel, Edgar via users
Cc: Dave Love
Subject: Re: [OMPI users] 4.1 mpi-io test failures on lustre
> How should we know that's expected to fail? It at least shouldn
I will have a look at those tests. The recent fixes were not correctness, but
performance fixes.
Nevertheless, we used to pass the mpich tests, but I admit that it is not a
testsuite that we run regularly, I will have a look at them. The atomicity
tests are expected to fail, since this the one c
the reason for potential performance issues on NFS are very different from
Lustre. Basically, depending on your use-case and the NFS configuration, you
have to enforce different locking policy to ensure correct output files. The
default value for chosen for ompio is the most conservative setting
I will have a look at the t_bigio tests on Lustre with ompio. We had from
collaborators some reports about the performance problems similar to the one
that you mentioned here (which was the reason we were hesitant to make ompio
the default on Lustre), but part of the problem is that we were not
the --with-lustre option twice, once inside of the
"--with-io-romio-flags=" (along with the option that you provided), and once
outside (for ompio).
Thanks
Edgar
-Original Message-
From: Mark Dixon
Sent: Monday, November 16, 2020 8:19 AM
To: Gabriel, Edgar via users
Cc: Gabr
this is in theory still correct, the default MPI I/O library used by Open MPI
on Lustre file systems is ROMIO in all release versions. That being said, ompio
does have support for Lustre as well starting from the 2.1 series, so you can
use that as well. The main reason that we did not switch to
the ompio software infrastructure has multiple frameworks.
fs framework: abstracts out file system level operations (open, close, etc)
fbtl framework: provides the abstractions and implementations of *individual*
file I/O operations (seek,read,write, iread,iwrite)
fcoll framework: provides the
Your code looks correct, and based on your output I would actually suspect that
the I/O part finished correctly, the error message that you see is not an IO
error, but from the btl (which is communication related).
What version of Open MPI are using, and on what file system?
Thanks
Edgar
-
The one test that would give you a good idea of the upper bound for your
scenario would be that write a benchmark where each process writes to a
separate file, and look at the overall bandwidth achieved across all processes.
The MPI I/O performance will be less or equal to the bandwidth achieved
Hi,
A couple of comments. First, if you use MPI_File_write_at, this is usually not
considered collective I/O, even if executed by multiple processes.
MPI_File_write_at_all would be collective I/O.
Second, MPI I/O can not do ‘magic’, but is bound by hardware that you are
providing. If already a
ompio only added recently support for gpfs, and its only available in master
(so far). If you are using any of the released versions of Open MPI (2.x, 3.x,
4.x) you will not find this feature in ompio yet. Thus, the issue is only how
to disable gpfs in romio. I could not find right away an optio
How is the performance if you leave a few cores for the OS, e,g. running with
60 processes instead of 64? Reasoning being that the file read operation is
really executed by the OS, and could potentially be quite resource intensive.
Thanks
Edgar
From: users On Behalf Of Ali Cherry via users
Sen
I am not an expert for the one-sided code in Open MPI, I wanted to comment
briefly on the potential MPI -IO related item. As far as I can see, the error
message
“Read -1, expected 48, errno = 1”
does not stem from MPI I/O, at least not from the ompio library. What file
system did you use for t
Orion,
It might be a good idea. This bug is triggered from the fcoll/two_phase
component (and having spent just two minutes in looking at it, I have a
suspicion what triggers it, namely in int vs. long conversion issue), so it is
probably unrelated to the other one.
I need to add running the ne
Never mind, I see it in the backtrace :-)
Will look into it, but am currently traveling. Until then, Gilles suggestion is
probably the right approach.
Thanks
Edgar
> -Original Message-
> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gabriel,
> Edgar via use
Orion,
I will look into this problem, is there a specific code or testcase that
triggers this problem?
Thanks
Edgar
> -Original Message-
> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Orion
> Poplawski via users
> Sent: Thursday, October 24, 2019 11:56 PM
> To: Open
22 matches
Mail list logo