On Tue, 2006-08-15 at 14:24 -0700, Tom Rosmond wrote:
> I am continuing to test the MPI-2 features of 1.1, and have run into 
> some puzzling behavior. I wrote a simple F90 program to test 'mpi_put' 
> and 'mpi_get' on a coordinate transformation problem on a two dual-core 
> processor Opteron workstation running the PGI 6.1 compiler. The program 
> runs correctly for a variety of problem sizes and processor counts.
> 
> However, my main interest is a large global weather prediction model 
> that has been running in production with 1-sided message passing on an 
> SGI Origin 3000 for several years. This code does not run with OMPI 
> 1-sided message passing. I have investigated the difference between this 
> code and the test program and noticed a critical difference. Both 
> programs call 'mpi_win_create' to create an integer 'handle' to the RMA 
> window used by 'mpi_put' and 'mpi_get'. In the test program this 
> 'handle' returns with a value of '1', but in the large code the 'handle' 
> returns with value '0'. Subsequent synchronization calls to 
> 'mpi_win_fence' succeed in the small program (error status eq 0), while 
> in the large code they fail (error status ne 0), and the transfers fail 
> also (no data is passed).
> 
> Do you have any suggestions on what could cause this difference in 
> behavior between the two codes, specifically why the 'handles' have 
> different values? Are there any diagnostics I could produce that would 
> provide information?

The difference in handle values is irrelevant to the failures you are
seeing.  Our handle 0 is MPI_WIN_NULL, so you should never see that
returned from MPI_WIN_CREATE.

Unfortunately, when I wrote the one-sided implementation, I didn't add
useful debugging messages the user can enable.  I can add some and make
a tarball, if you would be willing to give it a try.  What error
messages are coming out of the large code?

By the way, just to make sure your expectations are set correctly, Open
MPI's one-sided performance in v1.1 and v1.2 is bad, as it's implemented
over the point-to-point engine.  You're not going to get Origin-like
performance out of the current implementation.

Brian


Reply via email to