Hello,

I am currently working on a Win32 program that makes some intensive 
calculation, and is already written to be multithreaded. As a result, it uses 
all the available cores on the PC it runs on.
The basic behavior is for the user to open a model, click the "start" button, 
then the threads are spawned, and once all is finished, control is given back 
to the user.
While this works great, we have found that for larger models, the computation 
time is limited by the number of cores as the pool of tasks that could run in 
parallel is not empty.
As a result, we are investigating the possibility to use grid computing to 
somehow multiply the number of available cores.
This, of course, has technical challenges and reading documentation on various 
websites led me to the OpenMPI one and to this list.
I'm not sure it's the appropriate place to ask my questions, but should it not 
be the case, please tell me what an appropriate place might be.

I understand that MPI is a framework that would facilitate the communication 
between the user's computer and the nodes that perform the distributed tasks.
What I have a hard time grasping are these :

What communication layer is used? How do I choose it?

What is the behavior in case a node dies or becomes unreachable?

What makes any given machine become a node available for tasks?

Is there some sort of load balancing ?

Is there a monitoring tool that would give me indications of the status and 
health of the nodes?

How does the "MPI enabled" code gets transferred to the nodes? If I understand 
things correctly, I would have to write a separate command line exe that takes 
care of the tasks and this would be the exe that gets sent over to node.

I'm quite sure all these are trivial questions for those with more experience, 
but I'm having a hard time finding resources that would answer those.

Thanks in advance for your help
Olivier

Reply via email to