Hi, I was comparing a simple Send-Recv program to another program with two memcpy's to/from shared memory. Number of processes = 2 and different array sizes (from 10^6 - 10^8 doubles) on IA64. With the --mca btl sm,self options I get almost twice the bandwidth compared to the two memcpy's. I looked at openmpi source and I cannot figure out if it's using anything other than simple glibc memcpy. I must be missing something. Can somebody please help?
Thanks, Nilesh. P.S. I was not sure if I should post this message in the users or the devl list, so I posted to both. Apologies for the multiple postings.