On Dec 19, 2014, at 10:44 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
> Regarding your second point, while I do tend to agree that such issue is > better addressed in the MPI Forum, the last attempt to fix this was certainly > not a resounding success. Yeah, fair enough -- but it wasn't a failure, either. It could definitely be moved forward, but it will take time/effort, which I unfortunately don't have. I would be willing, however, to spin up someone who *does* have time/effort available to move the proposal forward. > Indeed, there is a slight window of opportunity for inconsistencies in the > recursive behavior. You're right; it's a small window in the threading case, but a) that's the worst kind :-), and b) the non-threaded case is actually worse (because the global state can change from underneath the loop). > But the inconsistencies were already in the code, especially in the single > threaded case. As we never received any complaints related to this topic I > did not deemed interesting to address them with my last commit. Moreover, the > specific behavior needed by PETSc is available in Open MPI when compiled > without thread support, as the only thing that "protects" the attributes is > that global mutex. Mmmm. Ok, I see your point. But this is a (very) slippery slope. > For example, in ompi_attr_delete_all(), it gets the count of all attributes > and then loops <count> times to delete each attribute. But each attribute > callback can now insert or delete attributes on that entity. This can mean > that the loop can either fail to delete an attribute (because some attribute > callback already deleted it) or fail to delete *all* attributes (because some > attribute callback added more). > > To be extremely precise the deletion part is always correct ...as long as the hash map is not altered from the application (e.g., by adding or deleting another attribute during a callback). I understand that you mention above that you're not worried about this case. I'm just picking here because there is quite definitely a case where the loop is *not* correct. PETSc apparently doesn't trigger this badness, but... like I said above, it's a (very) slippery slope. > as it copies the values to be deleted into a temporary array before calling > any callbacks (and before releasing the mutex), so we only remove what was in > the object attribute hash when the function was called. Don't misunderstand > we have an extremely good reason to do it this way, we need to call the > callbacks in the order in which they were created (mandated by the MPI > standard). > > ompi_attr_copy_all() has similar problems -- in general, the hash that it is > looping over can change underneath it. > > For the copy it is a little bit more tricky, as the calling order is not > imposed. Our peculiar implementation of the hash table (with array) makes the > code work, with a single (possible minor) exception when the hash table > itself is grown between 2 calls. However, as stated before this issue was > already present in the code in single threaded cases for years. Addressing it > is another 2 line patch, but I leave this exercise to an interested reader. Yeah, thanks for that. :-) To be clear: both the copy and the delete code could be made thread safe. I just don't think we should be encouraging users to be exercising undefined / probably not-portable MPI code. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/