On 09/18/2014 04:56 PM, Beichuan Yan wrote:
Rob,

Thank you very much for the suggestion. There are two independent scenarios 
using parallel IO in my code:

1. MPI processes conditionally print, i.e., some processes print in current loop (but may not print 
in next loop), some processes do not print in current loop (but may print next loop), and it does 
not matter who prints first or last (NOT ordered). Clearly we cannot use a collective call for this 
scenario because it is conditional, and I don't need it to be ordered, so I chose 
MPI_File_write_shared (non-collective operation, shared pointer, but not ordered). It works well if 
Lustre is mounted with "flock", but does not work without "flock".

In this scenario 1, we cannot use individual pointer or explicit offset because 
we cannot predetermine the offset for each process. That is why I had to use a 
shared file pointer.

2. Each MPI process unconditionally prints to a shared file (even if it prints nothing) and the 
order does not matter. Your suggestion works for this scenario. Actually it is even simpler because 
order does not matter. We have two options:  (2A) use shared file pointer, either 
MPI_File_write_shared (non-collective) or MPI_File_write_ordered (collective) works, and don't need 
to predetermine offset, it requires "flock". (2B). use individual file pointer, e.g., 
MPI_File_seek (or MPI_File_set_view) and MPI_File_write_all (collective), this requires calculating 
offset, which is pre-determinable. It does not require "flock".

In summary, scenario 2 can avoid "flock" requirement by using 2B, but scenario 
1 cannot.

Thanks for the report. OMPIO might support shared file pointers better -- Edgar Gabriel can comment if that's the case. I'll dust off our old RMA-based approach for shared file pointers. It's not perfect, but for folks having difficulty with the file-backed shared file pointer operations it might be useful.

==rob


Thanks,
Beichuan

-----Original Message-----
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Rob Latham
Sent: Thursday, September 18, 2014 08:49
To: us...@open-mpi.org
Subject: Re: [OMPI users] File locking in ADIO, OpenMPI 1.6.4



On 09/17/2014 05:46 PM, Beichuan Yan wrote:
Hi Rob,

As you pointed out in April that there are many cases that could
arouse ADIOI_Set_lock error. My code writes to a file at a location
specified by a shared file pointer (it is a blocking and collective
call): MPI_File_write_ordered(contactFile, const_cast<char*>
(inf.str().c_str()), length, MPI_CHAR, &status);

That is why disabling data-sieving does not work for me, even if I tested it 
with latest openmpi-1.8.2 and gcc-4.9.1.

Can I ask a question? Except that Lustre is mounted with "flock" option, is 
there other workaround to avoid this ADIOI_Set_lock error in MPI-2 parallel IO?


Shared file pointer operations don't get a lot of attention.

ROMIO is going to try to lock a hidden file that contains the 8 byte location 
of the shared file pointer.

Do you mix independent shared file pointer operations with ordered mode 
operations?  If not, read on for a better way to achieve ordering:

It's pretty easy to replace ordered mode operations with a collective call of 
the same behavior.  The key is to use MPI_SCAN:

            MPI_File_get_position(mpi_fh, &offset);

            MPI_Scan(&incr, &new_offset, 1, MPI_LONG_LONG_INT,
                            MPI_SUM, MPI_COMM_WORLD);
            new_offset -= incr;
            new_offset += offset;

            ret = MPI_File_write_at_all(mpi_fh, new_offset, buf, count,
                                    datatype, status);

See: every process has "incr" amount of data.  The MPI_SCAN ensures the offsets 
computed are ascending in rank order (as they would for ordered mode i/o) and the actual 
I/O happens with a much faster MPI_File_write_at_all.

We wrote this up in our 2005 shared memory for shared file pointers paper, even 
though this approach doesn't need RMA shared memory.

==rob

Thanks,
Beichuan

-----Original Message-----
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Rob
Latham
Sent: Monday, April 14, 2014 14:24
To: Open MPI Users
Subject: Re: [OMPI users] File locking in ADIO, OpenMPI 1.6.4



On 04/08/2014 05:49 PM, Daniel Milroy wrote:
Hello,

The file system in question is indeed Lustre, and mounting with flock
isn't possible in our environment.  I recommended the following
changes to the users' code:

Hi.  I'm the ROMIO guy, though I do rely on the community to help me keep the 
lustre driver up to snuff.

MPI_Info_set(info, "collective_buffering", "true");
MPI_Info_set(info, "romio_lustre_ds_in_coll", "disable");
MPI_Info_set(info, "romio_ds_read", "disable"); MPI_Info_set(info,
"romio_ds_write", "disable");

Which results in the same error as before.  Are there any other MPI
options I can set?

I'd like to hear more about the workload generating these lock messages, but I 
can tell you the situations in which ADIOI_SetLock gets called:
- everywhere in NFS.  If you have a Lustre file system exported to
some clients as NFS, you'll get NFS (er, that might not be true unless
you pick up a recent patch)
- when writing a non-contiguous region in file, unless you disable data 
sieving, as you did above.
- note: you don't need to disable data sieving for reads, though you might want 
to if the data sieving algorithm is wasting a lot of data.
- if atomic mode was set on the file (i.e. you called
MPI_File_set_atomicity)
- if you use any of the shared file pointer operations
- if you use any of the ordered mode collective operations

you've turned off data sieving writes, which is what I would have first guessed 
would trigger this lock message.  So I guess you are hitting one of the other 
cases.

==rob

--
Rob Latham
Mathematics and Computer Science Division Argonne National Lab, IL USA
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/09/25337.php


--
Rob Latham
Mathematics and Computer Science Division Argonne National Lab, IL USA 
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/09/25356.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/09/25358.php


--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

Reply via email to