Thanks,
Beichuan
-----Original Message-----
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Rob Latham
Sent: Thursday, September 18, 2014 08:49
To: us...@open-mpi.org
Subject: Re: [OMPI users] File locking in ADIO, OpenMPI 1.6.4
On 09/17/2014 05:46 PM, Beichuan Yan wrote:
Hi Rob,
As you pointed out in April that there are many cases that could
arouse ADIOI_Set_lock error. My code writes to a file at a location
specified by a shared file pointer (it is a blocking and collective
call): MPI_File_write_ordered(contactFile, const_cast<char*>
(inf.str().c_str()), length, MPI_CHAR, &status);
That is why disabling data-sieving does not work for me, even if I
tested it with latest openmpi-1.8.2 and gcc-4.9.1.
Can I ask a question? Except that Lustre is mounted with "flock"
option, is there other workaround to avoid this ADIOI_Set_lock error
in MPI-2 parallel IO?
Shared file pointer operations don't get a lot of attention.
ROMIO is going to try to lock a hidden file that contains the 8 byte
location of the shared file pointer.
Do you mix independent shared file pointer operations with ordered
mode operations? If not, read on for a better way to achieve ordering:
It's pretty easy to replace ordered mode operations with a collective
call of the same behavior. The key is to use MPI_SCAN:
MPI_File_get_position(mpi_fh, &offset);
MPI_Scan(&incr, &new_offset, 1, MPI_LONG_LONG_INT,
MPI_SUM, MPI_COMM_WORLD);
new_offset -= incr;
new_offset += offset;
ret = MPI_File_write_at_all(mpi_fh, new_offset, buf, count,
datatype, status);
See: every process has "incr" amount of data. The MPI_SCAN ensures
the offsets computed are ascending in rank order (as they would for
ordered mode i/o) and the actual I/O happens with a much faster
MPI_File_write_at_all.
We wrote this up in our 2005 shared memory for shared file pointers
paper, even though this approach doesn't need RMA shared memory.
==rob
Thanks,
Beichuan
-----Original Message-----
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Rob
Latham
Sent: Monday, April 14, 2014 14:24
To: Open MPI Users
Subject: Re: [OMPI users] File locking in ADIO, OpenMPI 1.6.4
On 04/08/2014 05:49 PM, Daniel Milroy wrote:
Hello,
The file system in question is indeed Lustre, and mounting with flock
isn't possible in our environment. I recommended the following
changes to the users' code:
Hi. I'm the ROMIO guy, though I do rely on the community to help me
keep the lustre driver up to snuff.
MPI_Info_set(info, "collective_buffering", "true");
MPI_Info_set(info, "romio_lustre_ds_in_coll", "disable");
MPI_Info_set(info, "romio_ds_read", "disable"); MPI_Info_set(info,
"romio_ds_write", "disable");
Which results in the same error as before. Are there any other MPI
options I can set?
I'd like to hear more about the workload generating these lock
messages, but I can tell you the situations in which ADIOI_SetLock
gets called:
- everywhere in NFS. If you have a Lustre file system exported to
some clients as NFS, you'll get NFS (er, that might not be true unless
you pick up a recent patch)
- when writing a non-contiguous region in file, unless you disable
data sieving, as you did above.
- note: you don't need to disable data sieving for reads, though you
might want to if the data sieving algorithm is wasting a lot of data.
- if atomic mode was set on the file (i.e. you called
MPI_File_set_atomicity)
- if you use any of the shared file pointer operations
- if you use any of the ordered mode collective operations
you've turned off data sieving writes, which is what I would have
first guessed would trigger this lock message. So I guess you are
hitting one of the other cases.
==rob
--
Rob Latham
Mathematics and Computer Science Division Argonne National Lab, IL USA
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/09/25337.php
--
Rob Latham
Mathematics and Computer Science Division Argonne National Lab, IL USA
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/09/25356.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/09/25358.php