Re: [OMPI users] SM btl slows down bandwidth?
At this time, we are not using non-temporal stores for shared memory operations. On Aug 13, 2008, at 11:46 AM, Ron Brightwell wrote: [...] MPICH2 manages to get about 5GB/s in shared memory performance on the Xeon 5420 system. Does the sm btl use a memcpy with non-temporal stores like MPICH2? This can be a big win for bandwidth benchmarks that don't actually touch their receive buffers at all... -Ron ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] Setting up Open MPI to run on multiple servers
On Aug 13, 2008, at 9:58 PM, Rayne wrote: I just tried to explicitly specify where 32.out is on the server when using mpirun, and it worked. So the problem I had earlier did lie in the server not being able to find 32.out. So what should I do so that I don't have to explicitly specify the location of the program everytime I run mpirun? I tried including the directory under PATH in .bash_profile in my server, where the 32.out should run on, restarted the server, but it didn't work. This shouldn't be necessary -- if both 32.out and 64.out are in the same directory and you're *in* that directory, then OMPI should find it because we add "." to the PATH. For example shell$ ls 32.out 64.out shell$ mpirun --host 32bithost.example.com -np 1 32.out \ --host 64bithost.example.com -np 1 64.out Also, since having the 32-bit server run the 32-bit program and the 64-bit PC run the 64-bit program works, I guess it means my server cannot run the program compiled by my PC and hence, the mpirun failed when trying to get both the PC and server to run the same program compiled by the PC. Keep in mind that you have to have had OMPI compiled for heterogeneous operation. I think that worked for some transports back in v1.2 (TCP?). -- Jeff Squyres Cisco Systems
Re: [OMPI users] SM btl slows down bandwidth?
Interestingly enough on the SPARC platform the Solaris memcpy's actually use non-temporal stores for copies >= 64KB. By default some of the mca parameters to the sm BTL stop at 32KB. I've done experimentations of bumping the sm segment sizes to above 64K and seen incredible speedup on our M9000 platforms. I am looking for some nice way to integrate a memcpy that lowers this boundary to 32KB or lower into Open MPI. I have not looked into whether Solaris x86/x64 memcpy's use the non-temporal stores or not. --td Message: 1 Date: Thu, 14 Aug 2008 09:28:59 -0400 From: Jeff Squyres Subject: Re: [OMPI users] SM btl slows down bandwidth? To: rbbr...@sandia.gov, Open MPI Users Message-ID: <562557eb-857c-4ca8-97ad-f294c7fed...@cisco.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes At this time, we are not using non-temporal stores for shared memory operations. On Aug 13, 2008, at 11:46 AM, Ron Brightwell wrote: >> [...] >> >> MPICH2 manages to get about 5GB/s in shared memory performance on the >> Xeon 5420 system. > > Does the sm btl use a memcpy with non-temporal stores like MPICH2? > This can be a big win for bandwidth benchmarks that don't actually > touch their receive buffers at all... > > -Ron > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems