On Dec 19, 2014, at 10:44 AM, George Bosilca <bosi...@icl.utk.edu> wrote:

> Regarding your second point, while I do tend to agree that such issue is 
> better addressed in the MPI Forum, the last attempt to fix this was certainly 
> not a resounding success.

Yeah, fair enough -- but it wasn't a failure, either.  It could definitely be 
moved forward, but it will take time/effort, which I unfortunately don't have. 
I would be willing, however, to spin up someone who *does* have time/effort 
available to move the proposal forward.

> Indeed, there is a slight window of opportunity for inconsistencies in the 
> recursive behavior.

You're right; it's a small window in the threading case, but a) that's the 
worst kind :-), and b) the non-threaded case is actually worse (because the 
global state can change from underneath the loop).

> But the inconsistencies were already in the code, especially in the single 
> threaded case. As we never received any complaints related to this topic I 
> did not deemed interesting to address them with my last commit. Moreover, the 
> specific behavior needed by PETSc is available in Open MPI when compiled 
> without thread support, as the only thing that "protects" the attributes is 
> that global mutex.

Mmmm.  Ok, I see your point.  But this is a (very) slippery slope.

> For example, in ompi_attr_delete_all(), it gets the count of all attributes 
> and then loops <count> times to delete each attribute.  But each attribute 
> callback can now insert or delete attributes on that entity.  This can mean 
> that the loop can either fail to delete an attribute (because some attribute 
> callback already deleted it) or fail to delete *all* attributes (because some 
> attribute callback added more).
> 
> To be extremely precise the deletion part is always correct

...as long as the hash map is not altered from the application (e.g., by adding 
or deleting another attribute during a callback).

I understand that you mention above that you're not worried about this case.  
I'm just picking here because there is quite definitely a case where the loop 
is *not* correct.  PETSc apparently doesn't trigger this badness, but... like I 
said above, it's a (very) slippery slope.

> as it copies the values to be deleted into a temporary array before calling 
> any callbacks (and before releasing the mutex), so we only remove what was in 
> the object attribute hash when the function was called. Don't misunderstand 
> we have an extremely good reason to do it this way, we need to call the 
> callbacks in the order in which they were created (mandated by the MPI 
> standard).
>  
> ompi_attr_copy_all() has similar problems -- in general, the hash that it is 
> looping over can change underneath it.
> 
> For the copy it is a little bit more tricky, as the calling order is not 
> imposed. Our peculiar implementation of the hash table (with array) makes the 
> code work, with a single (possible minor) exception when the hash table 
> itself is grown between 2 calls. However, as stated before this issue was 
> already present in the code in single threaded cases for years. Addressing it 
> is another 2 line patch, but I leave this exercise to an interested reader.

Yeah, thanks for that.  :-)

To be clear: both the copy and the delete code could be made thread safe.  I 
just don't think we should be encouraging users to be exercising undefined / 
probably not-portable MPI code.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to