It's likely a BIOS bug. But I can't say more until you send the relevant data as explained earlier. Brice
Le 20/12/2014 18:10, Sergio Manzetti a écrit : > Dear Brice, the BIOS is the most latest. However, i wonder if this > could be a hardware error, as openmpi sources claim. Is there any > way to find out if this is a hardware error? > > Thanks > > > > From: users-requ...@open-mpi.org > > Subject: users Digest, Vol 3074, Issue 1 > > To: us...@open-mpi.org > > Date: Sat, 20 Dec 2014 12:00:02 -0500 > > > > Send users mailing list submissions to > > us...@open-mpi.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > or, via email, send a message with subject or body 'help' to > > users-requ...@open-mpi.org > > > > You can reach the person managing the list at > > users-ow...@open-mpi.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of users digest..." > > > > > > Today's Topics: > > > > 1. Re: Deadlock in OpenMPI 1.8.3 and PETSc 3.4.5 > > (Jeff Squyres (jsquyres)) > > 2. Hwloc error with Openmpi 1.8.3 on AMD 64 (Sergio Manzetti) > > 3. Re: Hwloc error with Openmpi 1.8.3 on AMD 64 (Brice Goglin) > > 4. best function to send data (Diego Avesani) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Fri, 19 Dec 2014 19:26:58 +0000 > > From: "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> > > To: "Open MPI User's List" <us...@open-mpi.org> > > Cc: "petsc-ma...@mcs.anl.gov" <petsc-ma...@mcs.anl.gov> > > Subject: Re: [OMPI users] Deadlock in OpenMPI 1.8.3 and PETSc 3.4.5 > > Message-ID: <027ab453-de85-4f08-bdd7-a676ca90e...@cisco.com> > > Content-Type: text/plain; charset="us-ascii" > > > > On Dec 19, 2014, at 10:44 AM, George Bosilca <bosi...@icl.utk.edu> > wrote: > > > > > Regarding your second point, while I do tend to agree that such > issue is better addressed in the MPI Forum, the last attempt to fix > this was certainly not a resounding success. > > > > Yeah, fair enough -- but it wasn't a failure, either. It could > definitely be moved forward, but it will take time/effort, which I > unfortunately don't have. I would be willing, however, to spin up > someone who *does* have time/effort available to move the proposal > forward. > > > > > Indeed, there is a slight window of opportunity for > inconsistencies in the recursive behavior. > > > > You're right; it's a small window in the threading case, but a) > that's the worst kind :-), and b) the non-threaded case is actually > worse (because the global state can change from underneath the loop). > > > > > But the inconsistencies were already in the code, especially in > the single threaded case. As we never received any complaints related > to this topic I did not deemed interesting to address them with my > last commit. Moreover, the specific behavior needed by PETSc is > available in Open MPI when compiled without thread support, as the > only thing that "protects" the attributes is that global mutex. > > > > Mmmm. Ok, I see your point. But this is a (very) slippery slope. > > > > > For example, in ompi_attr_delete_all(), it gets the count of all > attributes and then loops <count> times to delete each attribute. But > each attribute callback can now insert or delete attributes on that > entity. This can mean that the loop can either fail to delete an > attribute (because some attribute callback already deleted it) or fail > to delete *all* attributes (because some attribute callback added more). > > > > > > To be extremely precise the deletion part is always correct > > > > ...as long as the hash map is not altered from the application > (e.g., by adding or deleting another attribute during a callback). > > > > I understand that you mention above that you're not worried about > this case. I'm just picking here because there is quite definitely a > case where the loop is *not* correct. PETSc apparently doesn't trigger > this badness, but... like I said above, it's a (very) slippery slope. > > > > > as it copies the values to be deleted into a temporary array > before calling any callbacks (and before releasing the mutex), so we > only remove what was in the object attribute hash when the function > was called. Don't misunderstand we have an extremely good reason to do > it this way, we need to call the callbacks in the order in which they > were created (mandated by the MPI standard). > > > > > > ompi_attr_copy_all() has similar problems -- in general, the hash > that it is looping over can change underneath it. > > > > > > For the copy it is a little bit more tricky, as the calling order > is not imposed. Our peculiar implementation of the hash table (with > array) makes the code work, with a single (possible minor) exception > when the hash table itself is grown between 2 calls. However, as > stated before this issue was already present in the code in single > threaded cases for years. Addressing it is another 2 line patch, but I > leave this exercise to an interested reader. > > > > Yeah, thanks for that. :-) > > > > To be clear: both the copy and the delete code could be made thread > safe. I just don't think we should be encouraging users to be > exercising undefined / probably not-portable MPI code. > > > > -- > > Jeff Squyres > > jsquy...@cisco.com > > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > > > > ------------------------------ > > > > Message: 2 > > Date: Fri, 19 Dec 2014 20:58:46 +0100 > > From: Sergio Manzetti <sergio.manze...@outlook.com> > > To: "us...@open-mpi.org" <us...@open-mpi.org> > > Subject: [OMPI users] Hwloc error with Openmpi 1.8.3 on AMD 64 > > Message-ID: <dub126-w2190e22e21596a1b834cf4e3...@phx.gbl> > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > > > > > > > > > Dear all, when trying to run NWchem with openmpi, I get this error. > > > > > > > > > **************************************************************************** > > * Hwloc has encountered what looks like an error from the operating > system. > > * > > * object intersection without inclusion! > > * Error occurred in topology.c line 594 > > * > > * Please report this error message to the hwloc user's mailing list, > > * along with the output from the hwloc-gather-topology.sh script. > > > > Is there any rationale for solving this? > > > > Thanks > > > > -------------- next part -------------- > > HTML attachment scrubbed and removed > > > > ------------------------------ > > > > Message: 3 > > Date: Fri, 19 Dec 2014 21:13:19 +0100 > > From: Brice Goglin <brice.gog...@inria.fr> > > To: Open MPI Users <us...@open-mpi.org> > > Subject: Re: [OMPI users] Hwloc error with Openmpi 1.8.3 on AMD 64 > > Message-ID: <549486df.50...@inria.fr> > > Content-Type: text/plain; charset="windows-1252" > > > > Hello, > > > > The rationale is to read the message and do what it says :) > > > > Have a look at > > www.open-mpi.org/projects/hwloc/doc/v1.10.0/a00028.php#faq_os_error > > Try upgrading your BIOS and kernel. > > > > Otherwise install hwloc and send the output (tarball) of > > hwloc-gather-topology to hwloc-users (not to OMPI users). > > > > thanks > > Brice > > > > > > > > Le 19/12/2014 20:58, Sergio Manzetti a ?crit : > > > > > > > > > Dear all, when trying to run NWchem with openmpi, I get this error. > > > > > > > > > > > > > **************************************************************************** > > > * Hwloc has encountered what looks like an error from the operating > > > system. > > > * > > > * object intersection without inclusion! > > > * Error occurred in topology.c line 594 > > > * > > > * Please report this error message to the hwloc user's mailing list, > > > * along with the output from the hwloc-gather-topology.sh script. > > > > > > Is there any rationale for solving this? > > > > > > Thanks > > > > > > > > > _______________________________________________ > > > users mailing list > > > us...@open-mpi.org > > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/26045.php > > > > -------------- next part -------------- > > HTML attachment scrubbed and removed > > > > ------------------------------ > > > > Message: 4 > > Date: Fri, 19 Dec 2014 23:56:36 +0100 > > From: Diego Avesani <diego.aves...@gmail.com> > > To: Open MPI Users <us...@open-mpi.org> > > Subject: [OMPI users] best function to send data > > Message-ID: > > <cag8o1y4b0uwydtrb+swdbra4tbk6ih5toeypga8b6vs-oty...@mail.gmail.com> > > Content-Type: text/plain; charset="utf-8" > > > > dear all users, > > I am new in MPI world. > > I would like to know what is the best choice and meaning between > different > > function. > > > > In my program I would like that each process send a vector of data > to all > > the other process. What do you suggest? > > Is it correct MPI_Bcast or I am missing something? > > > > Thanks a lot > > > > Diego > > -------------- next part -------------- > > HTML attachment scrubbed and removed > > > > ------------------------------ > > > > Subject: Digest Footer > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > ------------------------------ > > > > End of users Digest, Vol 3074, Issue 1 > > ************************************** > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/26048.php