Hi,

We are planning to use hadoop for some very expensive and long running
processing tasks.
The computing nodes that we plan to use are very heavy in terms of CPU and
memory requirement e.g one process instance takes almost 100% CPU (1 core)
and around 300 -400 MB of RAM.
The first time the process loads it can take around 1-1:30 minutes but after
that we can provide the data to process and it takes few seconds to process.
Can I model it on hadoop ?
Can I have my processes pre-loaded on the task processing machines and the
data be provided by hadoop? This will save the 1-1:30 minutes of intial load
time that it would otherwise take for each task.
I want to run a number of these processes in parallel  based on the machines
capacity (e.g 6 instances on a 8 cpu box) or using capacity scheduler.

Please let me know if this is possible or any pointers to how it can be done
?

Thanks,
Amit

Reply via email to