> > > >These are all good ideas. The other trick -which has been discussed >recently in the context of the Platform Scheduler- is to run HDFS across >all nodes, but switch the workload of the cluster between Hadoop jobs >(MR, Graph, Hamster), and other work (Grid jobs). That way the >filesystem is just a very large FS for anything. If some grid jobs don't >use the HDFS, the nodes can still serve up their data.
This used to be called Hadoop On Demand (HoD), which used to deploy a mapreduce cluster on-demand, using torque to allocate nodes. :-) - milind --- Milind Bhandarkar Greenplum Labs, EMC (Disclaimer: Opinions expressed in this email are those of the author, and do not necessarily represent the views of any organization, past or present, the author might be affiliated with.)