amjad ali wrote:
Hi,
thanks T.Prince,

Your saying:
"I'll just mention that we are well into the era of 3 levels of programming parallelization: vectorization, threaded parallel (e.g. OpenMP), and process parallel (e.g. MPI)." is a really great new learning for me. Now I can perceive better.


Can you please explain a bit about:

" This application gains significant benefit from cache blocking, so vectorization has more opportunity to gain than for applications which have less memory locality."

So now should I conclude from your reply that if we have single core processor in a PC, even than we can get benefit of Auto-Vectorization? And we do not need free cores for getting benefit of auto-vectorization?

Thank you very much.
Yes, we were using auto-vectorization from before the beginnings of MPI back in the days of single core CPUs; in fact, it would often show a greater gain than it did on later multi-core CPUs. The reason for greater effectiveness of auto-vectorization with cache blocking and possibly with single core CPUs would be less saturation of memory buss.

Reply via email to