On 9/18/2014 9:12 PM, Rob Latham wrote:


On 09/18/2014 04:56 PM, Beichuan Yan wrote:
Rob,

Thank you very much for the suggestion. There are two independent
scenarios using parallel IO in my code:

1. MPI processes conditionally print, i.e., some processes print in
current loop (but may not print in next loop), some processes do not
print in current loop (but may print next loop), and it does not
matter who prints first or last (NOT ordered). Clearly we cannot use a
collective call for this scenario because it is conditional, and I
don't need it to be ordered, so I chose MPI_File_write_shared
(non-collective operation, shared pointer, but not ordered). It works
well if Lustre is mounted with "flock", but does not work without
"flock".

In this scenario 1, we cannot use individual pointer or explicit
offset because we cannot predetermine the offset for each process.
That is why I had to use a shared file pointer.

2. Each MPI process unconditionally prints to a shared file (even if
it prints nothing) and the order does not matter. Your suggestion
works for this scenario. Actually it is even simpler because order
does not matter. We have two options:  (2A) use shared file pointer,
either MPI_File_write_shared (non-collective) or
MPI_File_write_ordered (collective) works, and don't need to
predetermine offset, it requires "flock". (2B). use individual file
pointer, e.g., MPI_File_seek (or MPI_File_set_view) and
MPI_File_write_all (collective), this requires calculating offset,
which is pre-determinable. It does not require "flock".

In summary, scenario 2 can avoid "flock" requirement by using 2B, but
scenario 1 cannot.

Thanks for the report.  OMPIO might support shared file pointers better
-- Edgar Gabriel can comment if that's the case.  I'll dust off our old
RMA-based approach for shared file pointers.  It's not perfect, but for
folks having difficulty with the file-backed shared file pointer
operations it might be useful.

I think there would be one or two options with OMPIO. However, OMPIO is not part of the 1.6 series, its only available starting from the 1.7 series.

The first option would be to use the "individual file" component for shared file pointers. It works for write-only scenarios only, if the user can live with minor inaccuracies in the ordering. The component basically lets every process write its data into a separate file, every write operation will be extended by a time stamp and some other metadata. Upon MPI_File_close, the individual files are merged using the time stamps to generate the final order between the items. However, since clocks on different nodes are not perfectly synchronized, the order between items as it appears in the final file might not be absolutely correct. No data item is however lost, and the ordering within a process is correct.

The second option would be to use the "addproc" component, which spawns an additional process to manage the shared file pointer. This component is available on trunk, but was not included in the release for good reasons, i.e. executing an MPI_Comm_spawn underneath the hood without the user actually being aware of it might be - lets say suprprising and have some sideeffects (e.g. exceeding the allocation etc.) . If the user is however aware of that and willing to do that, its fairly easily to set it up etc.

Thanks
Edgar


==rob


Thanks,
Beichuan

-----Original Message-----
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Rob Latham
Sent: Thursday, September 18, 2014 08:49
To: us...@open-mpi.org
Subject: Re: [OMPI users] File locking in ADIO, OpenMPI 1.6.4



On 09/17/2014 05:46 PM, Beichuan Yan wrote:
Hi Rob,

As you pointed out in April that there are many cases that could
arouse ADIOI_Set_lock error. My code writes to a file at a location
specified by a shared file pointer (it is a blocking and collective
call): MPI_File_write_ordered(contactFile, const_cast<char*>
(inf.str().c_str()), length, MPI_CHAR, &status);

That is why disabling data-sieving does not work for me, even if I
tested it with latest openmpi-1.8.2 and gcc-4.9.1.

Can I ask a question? Except that Lustre is mounted with "flock"
option, is there other workaround to avoid this ADIOI_Set_lock error
in MPI-2 parallel IO?


Shared file pointer operations don't get a lot of attention.

ROMIO is going to try to lock a hidden file that contains the 8 byte
location of the shared file pointer.

Do you mix independent shared file pointer operations with ordered
mode operations?  If not, read on for a better way to achieve ordering:

It's pretty easy to replace ordered mode operations with a collective
call of the same behavior.  The key is to use MPI_SCAN:

            MPI_File_get_position(mpi_fh, &offset);

            MPI_Scan(&incr, &new_offset, 1, MPI_LONG_LONG_INT,
                            MPI_SUM, MPI_COMM_WORLD);
            new_offset -= incr;
            new_offset += offset;

            ret = MPI_File_write_at_all(mpi_fh, new_offset, buf, count,
                                    datatype, status);

See: every process has "incr" amount of data.  The MPI_SCAN ensures
the offsets computed are ascending in rank order (as they would for
ordered mode i/o) and the actual I/O happens with a much faster
MPI_File_write_at_all.

We wrote this up in our 2005 shared memory for shared file pointers
paper, even though this approach doesn't need RMA shared memory.

==rob

Thanks,
Beichuan

-----Original Message-----
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Rob
Latham
Sent: Monday, April 14, 2014 14:24
To: Open MPI Users
Subject: Re: [OMPI users] File locking in ADIO, OpenMPI 1.6.4



On 04/08/2014 05:49 PM, Daniel Milroy wrote:
Hello,

The file system in question is indeed Lustre, and mounting with flock
isn't possible in our environment.  I recommended the following
changes to the users' code:

Hi.  I'm the ROMIO guy, though I do rely on the community to help me
keep the lustre driver up to snuff.

MPI_Info_set(info, "collective_buffering", "true");
MPI_Info_set(info, "romio_lustre_ds_in_coll", "disable");
MPI_Info_set(info, "romio_ds_read", "disable"); MPI_Info_set(info,
"romio_ds_write", "disable");

Which results in the same error as before.  Are there any other MPI
options I can set?

I'd like to hear more about the workload generating these lock
messages, but I can tell you the situations in which ADIOI_SetLock
gets called:
- everywhere in NFS.  If you have a Lustre file system exported to
some clients as NFS, you'll get NFS (er, that might not be true unless
you pick up a recent patch)
- when writing a non-contiguous region in file, unless you disable
data sieving, as you did above.
- note: you don't need to disable data sieving for reads, though you
might want to if the data sieving algorithm is wasting a lot of data.
- if atomic mode was set on the file (i.e. you called
MPI_File_set_atomicity)
- if you use any of the shared file pointer operations
- if you use any of the ordered mode collective operations

you've turned off data sieving writes, which is what I would have
first guessed would trigger this lock message.  So I guess you are
hitting one of the other cases.

==rob

--
Rob Latham
Mathematics and Computer Science Division Argonne National Lab, IL USA
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/09/25337.php


--
Rob Latham
Mathematics and Computer Science Division Argonne National Lab, IL USA
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/09/25356.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/09/25358.php



--
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335

Reply via email to