On Tue, 2006-08-15 at 14:24 -0700, Tom Rosmond wrote: > I am continuing to test the MPI-2 features of 1.1, and have run into > some puzzling behavior. I wrote a simple F90 program to test 'mpi_put' > and 'mpi_get' on a coordinate transformation problem on a two dual-core > processor Opteron workstation running the PGI 6.1 compiler. The program > runs correctly for a variety of problem sizes and processor counts. > > However, my main interest is a large global weather prediction model > that has been running in production with 1-sided message passing on an > SGI Origin 3000 for several years. This code does not run with OMPI > 1-sided message passing. I have investigated the difference between this > code and the test program and noticed a critical difference. Both > programs call 'mpi_win_create' to create an integer 'handle' to the RMA > window used by 'mpi_put' and 'mpi_get'. In the test program this > 'handle' returns with a value of '1', but in the large code the 'handle' > returns with value '0'. Subsequent synchronization calls to > 'mpi_win_fence' succeed in the small program (error status eq 0), while > in the large code they fail (error status ne 0), and the transfers fail > also (no data is passed). > > Do you have any suggestions on what could cause this difference in > behavior between the two codes, specifically why the 'handles' have > different values? Are there any diagnostics I could produce that would > provide information?
The difference in handle values is irrelevant to the failures you are seeing. Our handle 0 is MPI_WIN_NULL, so you should never see that returned from MPI_WIN_CREATE. Unfortunately, when I wrote the one-sided implementation, I didn't add useful debugging messages the user can enable. I can add some and make a tarball, if you would be willing to give it a try. What error messages are coming out of the large code? By the way, just to make sure your expectations are set correctly, Open MPI's one-sided performance in v1.1 and v1.2 is bad, as it's implemented over the point-to-point engine. You're not going to get Origin-like performance out of the current implementation. Brian