Re: [OMPI users] [petsc-dev] potential bug with MPI_Win_fence() in openmpi-1.8.4

2015-04-30 Thread Satish Balay
Great! Thanks for checking. Satish On Thu, 30 Apr 2015, George Bosilca wrote: > I went over the code and in fact I think it is correct as is. The length is > for the local representation, which indeed uses pointers to datatype > structures. On the opposite, the total_pack_size represents the amo

Re: [OMPI users] [petsc-dev] potential bug with MPI_Win_fence() in openmpi-1.8.4

2015-04-30 Thread George Bosilca
I went over the code and in fact I think it is correct as is. The length is for the local representation, which indeed uses pointers to datatype structures. On the opposite, the total_pack_size represents the amount of space we would need to store the data in a format that can be sent to another pe

Re: [OMPI users] potential bug with MPI_Win_fence() in openmpi-1.8.4

2015-04-30 Thread Jeff Squyres (jsquyres)
Per Satish's last mail (http://www.open-mpi.org/community/lists/users/2015/04/26823.php), George is looking at a followup issue... > On Apr 30, 2015, at 2:57 PM, Ralph Castain wrote: > > Thanks! The patch wasn’t quite correct, but we have a good one now and it is > going into 1.8.5 (just squ

Re: [OMPI users] [petsc-dev] potential bug with MPI_Win_fence() in openmpi-1.8.4

2015-04-30 Thread George Bosilca
In the packed representation we store not MPI_Datatypes but a handcrafted id for each one. The 2 codes should have been in sync. I'm looking at another issue right now, and I'll come back to this one right after. Thanks for paying attention to the code. George. On Thu, Apr 30, 2015 at 3:13 PM,

Re: [OMPI users] [petsc-dev] potential bug with MPI_Win_fence() in openmpi-1.8.4

2015-04-30 Thread Jeff Squyres (jsquyres)
Oops -- that was a mistake from George when he committed the fix, and I just propagated that mistake into the v1.8 pull request. I'll fix it there, at least. But the master commit message is unfortunately going to have to stay wrong. :-( > On Apr 30, 2015, at 2:59 PM, Matthew Knepley wrote

Re: [OMPI users] [petsc-dev] potential bug with MPI_Win_fence() in openmpi-1.8.4

2015-04-30 Thread Satish Balay
Thanks for checking and getting a more appropriate fix in. I've just tried this out - and the PETSc test code runs fine with it. BTW: There is one inconsistancy in ompi/datatype/ompi_datatype_args.c [that I noticed] - that you might want to check. Perhaps the second line should be "(DC) * sizeof

Re: [OMPI users] potential bug with MPI_Win_fence() in openmpi-1.8.4

2015-04-30 Thread Ralph Castain
Thanks! The patch wasn’t quite correct, but we have a good one now and it is going into 1.8.5 (just squeaked in before release) > On Apr 29, 2015, at 9:50 PM, Satish Balay wrote: > > OpenMPI developers, > > We've had issues (memory errors) with OpenMPI - and code in PETSc > library that uses M

Re: [OMPI users] potential bug with MPI_Win_fence() in openmpi-1.8.4

2015-04-30 Thread Jeff Squyres (jsquyres)
Thank you! George reviewed your patch and adjusted it a bit. We applied it to master and it's pending to the release series (v1.8.x). Would you mind testing a nightly master snapshot? It should be in tonight's build: http://www.open-mpi.org/nightly/master/ > On Apr 30, 2015, at 12:50

Re: [OMPI users] new hwloc error

2015-04-30 Thread Ralph Castain
The planning is pretty simple: at startup, mpirun launches a daemon on each node. If —hetero-nodes is provided, each daemon returns the topology discovered by hwloc - otherwise, only the first daemon does. Mpirun then assigns procs to each node in a round-robin fashion (assuming you haven’t told

Re: [OMPI users] new hwloc error

2015-04-30 Thread Noam Bernstein
> On Apr 29, 2015, at 5:59 PM, Ralph Castain wrote: > > Try adding —hetero-nodes to the cmd line and see if that helps resolve the > problem. Of course, if all the machines are identical, then it won’t They are identical, and the problem is new. That’s what’s most mysterious about it. Can

[OMPI users] potential bug with MPI_Win_fence() in openmpi-1.8.4

2015-04-30 Thread Satish Balay
OpenMPI developers, We've had issues (memory errors) with OpenMPI - and code in PETSc library that uses MPI_Win_fence(). Vagrind shows memory corruption deep inside OpenMPI function stack. I'm attaching a potential patch that appears to fix this issue for us. [the corresponding valgrind trace is