I'm pretty sure that they are correct. Our one-sided implementation is buggier than I'd like (indeed, I'm in the process of rewriting most of it as part of Open MPI's support for MPI-3's revised RDMA), so it's likely that the bugs are in Open MPI's onesided support. Can you try a more recent release (something from the 1.5 tree) and see if the problem persists?
Thanks, Brian On 2/29/12 10:56 AM, "Jeffrey Squyres" <jsquy...@cisco.com> wrote: >FWIW, I'm immediately suspicious of *any* MPI application that uses the >MPI one-sided operations (i.e., MPI_PUT and MPI_GET). It looks like >these two OSU benchmarks are using those operations. > >Is it known that these two benchmarks are correct? > > > >On Feb 29, 2012, at 11:33 AM, Venkateswara Rao Dokku wrote: > >> Sorry, i forgot to introduce the system.. Ours is the customized OFED >>stack implemented to work on the specific hardware.. We tested the stack >>with the q-perf and Intel Benchmarks(IMB-3.2.2).. they went fine.. We >>want to execute the osu_benchamark3.1.1 suite on our OFED.. >> >> On Wed, Feb 29, 2012 at 9:57 PM, Venkateswara Rao Dokku >><dvrao....@gmail.com> wrote: >> Hiii, >> I tried executing osu_benchamarks-3.1.1 suite with the openmpi-1.4.3... >>I could run 10 bench-mark tests (except osu_put_bibw,osu_put_bw,osu_ >> get_bw,osu_latency_mt) out of 14 tests in the bench-mark suite... and >>the remaining tests are hanging at some message size.. the output is >>shown below >> >> [root@test2 ~]# mpirun --prefix /usr/local/ -np 2 --mca btl >>openib,self,sm -H 192.168.0.175,192.168.0.174 --mca >>orte_base_help_aggregate 0 >>/root/ramu/ofed_pkgs/osu_benchmarks-3.1.1/osu_put_bibw >> failed to create doorbell file /dev/plx2_char_dev >> >>------------------------------------------------------------------------- >>- >> WARNING: No preset parameters were found for the device that Open MPI >> detected: >> >> Local host: test1 >> Device name: plx2_0 >> Device vendor ID: 0x10b5 >> Device vendor part ID: 4277 >> >> Default device parameters will be used, which may result in lower >> performance. You can edit any of the files specified by the >> btl_openib_device_param_files MCA parameter to set values for your >> device. >> >> NOTE: You can turn off this warning by setting the MCA parameter >> btl_openib_warn_no_device_params_found to 0. >> >>------------------------------------------------------------------------- >>- >> failed to create doorbell file /dev/plx2_char_dev >> >>------------------------------------------------------------------------- >>- >> WARNING: No preset parameters were found for the device that Open MPI >> detected: >> >> Local host: test2 >> Device name: plx2_0 >> Device vendor ID: 0x10b5 >> Device vendor part ID: 4277 >> >> Default device parameters will be used, which may result in lower >> performance. You can edit any of the files specified by the >> btl_openib_device_param_files MCA parameter to set values for your >> device. >> >> NOTE: You can turn off this warning by setting the MCA parameter >> btl_openib_warn_no_device_params_found to 0. >> >>------------------------------------------------------------------------- >>- >> alloc_srq max: 512 wqe_shift: 5 >> alloc_srq max: 512 wqe_shift: 5 >> alloc_srq max: 512 wqe_shift: 5 >> alloc_srq max: 512 wqe_shift: 5 >> alloc_srq max: 512 wqe_shift: 5 >> alloc_srq max: 512 wqe_shift: 5 >> # OSU One Sided MPI_Put Bi-directional Bandwidth Test v3.1.1 >> # Size Bi-Bandwidth (MB/s) >> plx2_create_qp line: 415 >> plx2_create_qp line: 415 >> plx2_create_qp line: 415 >> plx2_create_qp line: 415 >> 1 0.00 >> 2 0.00 >> 4 0.01 >> 8 0.03 >> 16 0.07 >> 32 0.15 >> 64 0.11 >> 128 0.21 >> 256 0.43 >> 512 0.88 >> 1024 2.10 >> 2048 4.21 >> 4096 8.10 >> 8192 16.19 >> 16384 8.46 >> 32768 20.34 >> 65536 39.85 >> 131072 84.22 >> 262144 142.23 >> 524288 234.83 >> mpirun: killing job... >> >> >>------------------------------------------------------------------------- >>- >> mpirun noticed that process rank 0 with PID 7305 on node test2 exited >>on signal 0 (Unknown signal 0). >> >>------------------------------------------------------------------------- >>- >> 2 total processes killed (some possibly by mpirun during cleanup) >> mpirun: clean termination accomplished >> >> [root@test2 ~]# mpirun --prefix /usr/local/ -np 2 --mca btl >>openib,self,sm -H 192.168.0.175,192.168.0.174 --mca >>orte_base_help_aggregate 0 >>/root/ramu/ofed_pkgs/osu_benchmarks-3.1.1/osu_put_bw >> failed to create doorbell file /dev/plx2_char_dev >> >>------------------------------------------------------------------------- >>- >> WARNING: No preset parameters were found for the device that Open MPI >> detected: >> >> Local host: test1 >> Device name: plx2_0 >> Device vendor ID: 0x10b5 >> Device vendor part ID: 4277 >> >> Default device parameters will be used, which may result in lower >> performance. You can edit any of the files specified by the >> btl_openib_device_param_files MCA parameter to set values for your >> device. >> >> NOTE: You can turn off this warning by setting the MCA parameter >> btl_openib_warn_no_device_params_found to 0. >> >>------------------------------------------------------------------------- >>- >> failed to create doorbell file /dev/plx2_char_dev >> >>------------------------------------------------------------------------- >>- >> WARNING: No preset parameters were found for the device that Open MPI >> detected: >> >> Local host: test2 >> Device name: plx2_0 >> Device vendor ID: 0x10b5 >> Device vendor part ID: 4277 >> >> Default device parameters will be used, which may result in lower >> performance. You can edit any of the files specified by the >> btl_openib_device_param_files MCA parameter to set values for your >> device. >> >> NOTE: You can turn off this warning by setting the MCA parameter >> btl_openib_warn_no_device_params_found to 0. >> >>------------------------------------------------------------------------- >>- >> alloc_srq max: 512 wqe_shift: 5 >> alloc_srq max: 512 wqe_shift: 5 >> alloc_srq max: 512 wqe_shift: 5 >> alloc_srq max: 512 wqe_shift: 5 >> alloc_srq max: 512 wqe_shift: 5 >> alloc_srq max: 512 wqe_shift: 5 >> # OSU One Sided MPI_Put Bandwidth Test v3.1.1 >> # Size Bandwidth (MB/s) >> plx2_create_qp line: 415 >> plx2_create_qp line: 415 >> plx2_create_qp line: 415 >> plx2_create_qp line: 415 >> 1 0.02 >> 2 0.05 >> 4 0.10 >> 8 0.19 >> 16 0.39 >> 32 0.77 >> 64 1.53 >> 128 2.57 >> 256 4.16 >> 512 8.30 >> 1024 16.62 >> 2048 33.22 >> 4096 66.51 >> 8192 42.45 >> 16384 11.99 >> 32768 18.20 >> 65536 76.04 >> 131072 98.64 >> 262144 407.66 >> 524288 489.84 >> mpirun: killing job... >> >> >>------------------------------------------------------------------------- >>- >> mpirun noticed that process rank 0 with PID 7314 on node test2 exited >>on signal 0 (Unknown signal 0). >> >>------------------------------------------------------------------------- >>- >> 2 total processes killed (some possibly by mpirun during cleanup) >> mpirun: clean termination accomplished >> >> I even checked the logs but i couldn't see any errors... >> Could you suggest a way to overcome/debug this issue.. >> >> Thanks for the kind reply.. >> >> >> -- >> Thanks & Regards, >> D.Venkateswara Rao, >> Software Engineer,One Convergence Devices Pvt Ltd., >> Jubille Hills,Hyderabad. >> >> >> >> >> -- >> Thanks & Regards, >> D.Venkateswara Rao, >> Software Engineer,One Convergence Devices Pvt Ltd., >> Jubille Hills,Hyderabad. >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > >-- >Jeff Squyres >jsquy...@cisco.com >For corporate legal information go to: >http://www.cisco.com/web/about/doing_business/legal/cri/ > > >_______________________________________________ >users mailing list >us...@open-mpi.org >http://www.open-mpi.org/mailman/listinfo.cgi/users > > -- Brian W. Barrett Dept. 1423: Scalable System Software Sandia National Laboratories