"Gabriel, Edgar via users" <users@lists.open-mpi.org> writes:
>> How should we know that's expected to fail? It at least shouldn't fail like >> that; set_atomicity doesn't return an error (which the test is prepared for >> on a filesystem like pvfs2). >> I assume doing nothing, but appearing to, can lead to corrupt data, and I'm >> surprised that isn't being seen already. >> HDF5 requires atomicity -- at least to pass its tests -- so presumably >> anyone like us who needs it should use something mpich-based with recent or >> old romio, and that sounds like most general HPC systems. >> Am I missing something? >> With the current romio everything I tried worked, but we don't get that >> option with openmpi. > > First of all, it is mentioned on the FAQ sites of Open MPI, although > admittedly it is not entirely update (it lists external32 support also > as missing, which is however now available since 4.1). Yes, the FAQ was full of confusing obsolete material when I last looked. Anyway, users can't be expected to check whether any particular operation is expected to fail silently. I should have said that MPI_File_set_atomicity(3) explicitly says the default is true for multiple nodes, and doesn't say the call is a no-op with the default implementation. I don't know whether the MPI spec allows not implementing it, but I at least expect an error return if it doesn't. As far as I remember, that's what romio does on a filesystem like pvfs2 (or lustre when people know better than implementers and insist on noflock); I mis-remembered from before, thinking that ompio would be changed to do the same. From that thread, I did think atomicity was on its way. Presumably an application requests atomicity for good reason, and can take appropriate action if the status indicates it's not available on that filesystem. > You don't need atomicity for the HDF5 tests, we are passing all of them to > the best my knowledge, and this is one of the testsuites that we do run > regularly as part of our standard testing process. I guess we're just better at breaking things. > I am aware that they have an atomicity test - which we pass for whatever > reason. This highlight also btw the issue(s) that I am having with the > atomicity option in MPI I/O. I don't know what the application is of atomicity in HDF5. Maybe it isn't required for typical operations, but I assume it's not used blithely. However, I'd have thought HDF5 should be prepared for something like pvfs2, and at least not abort the test at that stage. I've learned to be wary of declaring concurrent systems working after a few tests. In fact, the phdf5 test failed for me like this when I tried across four lustre client nodes with 4.1's defaults. (I'm confused about the striping involved, because I thought I set it to four, and now it shows as one on that directory.) ... Testing -- dataset atomic updates (atomicity) Proc 9: *** Parallel ERRProc 54: *** Parallel ERROR *** VRFY (H5Sset_hyperslab succeeded) failed at line 4293 in t_dset.c aborting MPI proceProc 53: *** Parallel ERROR *** Unfortunately I hadn't turned on backtracing, and I wouldn't get another job trough for a while. > The entire infrastructure to enforce atomicity is actually in place in ompio, > and I can give you the option on how to enforce strict atomic behavior for > all files in ompio (just not on a per file basis), just be aware that the > performance will nose-dive. This is not just the case with ompio, but also in > romio, you can read up on that various discussion boards on that topic, look > at NFS related posts (where you need the atomicity for correctness in > basically all scenarios). I'm fairly sure I accidentally ran tests successfully on NFS4, at least single-node. I never found a good discussion of the topic, and what I have seen about "NFS" was probably specific to NFS3 and non-POSIX compliance, though I don't actually care about parallel i/o on NFS. The information we got about lustre was direct from Rob Latham, as nothing showed up online. I don't like fast-but-wrong, so I think there should be the option of correctness, especially as it's the documented default. > Just as another data point, in the 8+ years that ompio has been available, > there was not one issue reported related to correctness due to missing the > atomicity option. Yes, I forget some history over the years, like that one on a local filesystem: <https://www.mail-archive.com/users@lists.open-mpi.org/msg32752.html>. > That being said, if you feel more comfortable using romio, it is completely > up to you. Open MPI offers this option, and it is incredibly easy to set the > default parameters on a platform for all users such that romio is being used. Unfortunately that option fails the tests. > We are doing with our limited resources the best we can, and while ompio is > by no means perfect, we try to be responsive to issues reported by users and > value constructive feedback and discussion. I'm sorry to sound off, but experience -- not just mine -- is that issues typically don't get resolved; Mark's issue has been open for a year. It probably doesn't help that people are even told off for using the issue tracker. Generally it's not surprising if there's a shortage of effort when outside contributions seem unwelcome. I've tried to contribute several times. The final attempt wasted two or three days, after being encouraged to get the port of current romio into a decent state when it was being done separately "behind the scenes", but that hasn't been released.