Re: [OMPI users] SM btl slows down bandwidth?

2008-08-14 Thread Jeff Squyres
At this time, we are not using non-temporal stores for shared memory  
operations.



On Aug 13, 2008, at 11:46 AM, Ron Brightwell wrote:


[...]

MPICH2 manages to get about 5GB/s in shared memory performance on the
Xeon 5420 system.


Does the sm btl use a memcpy with non-temporal stores like MPICH2?
This can be a big win for bandwidth benchmarks that don't actually
touch their receive buffers at all...

-Ron


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Setting up Open MPI to run on multiple servers

2008-08-14 Thread Jeff Squyres

On Aug 13, 2008, at 9:58 PM, Rayne wrote:

I just tried to explicitly specify where 32.out is on the server  
when using mpirun, and it worked. So the problem I had earlier did  
lie in the server not being able to find 32.out. So what should I do  
so that I don't have to explicitly specify the location of the  
program everytime I run mpirun? I tried including the directory  
under PATH in .bash_profile in my server, where the 32.out should  
run on, restarted the server, but it didn't work.


This shouldn't be necessary -- if both 32.out and 64.out are in the  
same directory and you're *in* that directory, then OMPI should find  
it because we add "." to the PATH.  For example


shell$ ls
32.out
64.out
shell$ mpirun --host 32bithost.example.com -np 1 32.out \
  --host 64bithost.example.com -np 1 64.out

Also, since having the 32-bit server run the 32-bit program and the  
64-bit PC run the 64-bit program works, I guess it means my server  
cannot run the program compiled by my PC and hence, the mpirun  
failed when trying to get both the PC and server to run the same  
program compiled by the PC.



Keep in mind that you have to have had OMPI compiled for heterogeneous  
operation.  I think that worked for some transports back in v1.2 (TCP?).


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] SM btl slows down bandwidth?

2008-08-14 Thread Terry Dontje
Interestingly enough on the SPARC platform the Solaris memcpy's actually 
use non-temporal stores for copies >= 64KB.  By default some of the mca 
parameters to the sm BTL stop at 32KB.  I've done experimentations of 
bumping the sm segment sizes to above 64K and seen incredible speedup on 
our M9000 platforms.  I am looking for some nice way to integrate a 
memcpy that lowers this boundary to 32KB or lower into Open MPI. 

I have not looked into whether Solaris x86/x64 memcpy's use the 
non-temporal stores or not.


--td

Message: 1
Date: Thu, 14 Aug 2008 09:28:59 -0400
From: Jeff Squyres 
Subject: Re: [OMPI users] SM btl slows down bandwidth?
To: rbbr...@sandia.gov, Open MPI Users 
Message-ID: <562557eb-857c-4ca8-97ad-f294c7fed...@cisco.com>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

At this time, we are not using non-temporal stores for shared memory  
operations.



On Aug 13, 2008, at 11:46 AM, Ron Brightwell wrote:

  

>> [...]
>>
>> MPICH2 manages to get about 5GB/s in shared memory performance on the
>> Xeon 5420 system.
  

>
> Does the sm btl use a memcpy with non-temporal stores like MPICH2?
> This can be a big win for bandwidth benchmarks that don't actually
> touch their receive buffers at all...
>
> -Ron
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




-- Jeff Squyres Cisco Systems