On 7/12/2011 11:06 PM, Mohan, Ashwin wrote:
Tim,

Thanks for your message. I was however not clear about your suggestions. Would 
appreciate if you could clarify.

You say," So, if you want a sane comparison but aren't willing to study the compiler 
manuals, you might use (if your source code doesn't violate the aliasing rules) mpiicpc 
-prec-div -prec-sqrt -ansi-alias  and at least (if your linux compiler is g++) mpiCC -O2 
possibly with some of the other options I mentioned earlier."
###From your response above, I understand to use, for Intel, this syntax: "mpiicpc -prec-div 
-prec-sqrt -ansi-alias" and for OPENMPI use "mpiCC -O2". I am not certain about the 
other options you mention.

###Also, I presently use a hostfile while submitting my mpirun. Each node has four slots and my 
hostfile was "nodename slots=4". My compile code is mpiCC -o xxx.xpp<filename>.

If you have as ancient a g++ as your indication of FC3 implies, it really isn't 
fair to compare it with a currently supported compiler.
###Do you suggest upgrading the current installation of g++? Would that help?
How much it would help would depend greatly on your source code. It won't help much anyway if you don't choose appropriate options. Current g++ is nearly as good at auto-vectorization as icpc, unless you dive into the pragmas and cilk stuff provided with icpc. You really need to look at the gcc manual to understand those options; going into it in any more depth here would try the patience of the list.

###How do I ensure that all 4 slots are active when i submit a mpirun -np 4<filename>  command. 
When I do "top", I notice that all 4 slots are active. I noticed this when I did 
"top" with the Intel machine too, that is, it showed four slots active.

Thank you..ashwin.
I was having trouble inferring what platform you are running on, I guessed a single core HyperThread, which doesn't seem to agree with your "4 slots" terminology. If you have 2 single core hyperthread CPUs, it would be a very unusual application to find a gain for running 2 MPI processes per core, but if the sight of 4 processes running on your graph was your goal, I won't argue against it. You must be aware that most clusters running CPUs of the past have HT disabled in BIOS setup.

--
Tim Prince

Reply via email to