amjad ali wrote:
Hi,
thanks T.Prince,
Your saying:
"I'll just mention that we are well into the era of 3 levels of
programming parallelization: vectorization, threaded parallel (e.g.
OpenMP), and process parallel (e.g. MPI)." is a really great new
learning for me. Now I can perceive better.
Can you please explain a bit about:
" This application gains significant benefit from cache blocking, so
vectorization has more opportunity to gain than for applications which
have less memory locality."
So now should I conclude from your reply that if we have single core
processor in a PC, even than we can get benefit of Auto-Vectorization?
And we do not need free cores for getting benefit of auto-vectorization?
Thank you very much.
Yes, we were using auto-vectorization from before the beginnings of MPI
back in the days of single core CPUs; in fact, it would often show a
greater gain than it did on later multi-core CPUs.
The reason for greater effectiveness of auto-vectorization with cache
blocking and possibly with single core CPUs would be less saturation of
memory buss.