Hey all, I've released an update to pallet-hadoop<https://github.com/pallet/pallet-hadoop>(0.3.0) that allows for spot instance support <http://aws.amazon.com/ec2/spot-instances/>. The example cluster below, when run on EC2, chooses large instances; A large spot instance costs 12 cents an hour, down from the usual 34 cents for on on-demand instance. (Current EMR pricing for large instances is up at 40 cents.)
To create a cluster of 20 cent spot instances is as easy as adding this entry to the :base-machine-spec map: :spot-price (float 0.20) Future updates will focus on making it very easy to run Cascalog<http://www.assembla.com/wiki/show/d9Z8_q-Omr35zteJe5cbLr> jobs (and other jar deployments) on these clusters. The goal of all of this is to make Big Data analysis approachable, affordable and interactive. More to come! ~Sam On Wed, Jun 1, 2011 at 10:26 PM, Sam Ritchie <sritchi...@gmail.com> wrote: > Hey all, > > I'd like to announce > Pallet-Hadoop<https://github.com/pallet/pallet-hadoop/tree/master>, > a layer built on top of Pallet <https://github.com/pallet/pallet> that > allows users to describe a Hadoop cluster configuration as a nested clojure > map. Here's a cluster with one master node and two slave nodes with some > custom properties, all 64 bit machines with at least 4 gigs of RAM, running > Ubuntu 10.10: > > > (def example-cluster > (cluster-spec :private > {:jobtracker (node-group [:jobtracker :namenode]) > :slaves (slave-group 2)} > :base-machine-spec {:os-family :ubuntu > :os-version-matches "10.10" > :os-64-bit true > :min-ram (* 4 1024)} > :base-props {:hdfs-site {:dfs.data.dir "/mnt/dfs/data" > :dfs.name.dir "/mnt/dfs/name"} > :mapred-site {:mapred.task.timeout 300000 > :mapred.reduce.tasks 3}})) > > Thanks to Pallet's flexibility and use of > jclouds<https://github.com/jclouds/jclouds>, > the cluster description can be written without reference to any specific > cloud provider, and can be used to boot machines on any of the major cloud > providers <https://github.com/jclouds/jclouds#readme> (or on local virtual > machines!) with a simple change of credentials. > > This example project <https://github.com/pallet/pallet-hadoop-example> > contains > everything you need to get started; it walks through all steps necessary to > boot a cluster and run the canonical word count example on Amazon's EC2 > platform. The project wiki <https://github.com/pallet/pallet-hadoop/wiki> > contains > a lot more detail on the design and flexibility of the data structures > involved. > > Future plans include intelligent default settings that adjust based on the > specs of the cluster, and the ability to run > Cascalog<https://github.com/nathanmarz/cascalog> queries > on these distributed clusters from Cake and Leiningen. > > I'd love to hear what you all think about this! Huge thanks to Hugo > Duncan<http://hugoduncan.org/>for getting this started, and to Toni > Batchelli <http://tbatchelli.org/> for his excellent work on this project > and its foundation, Pallet's new Hadoop > crate<https://github.com/pallet/pallet-apache-crates> > . > > Cheers, > Sam > > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en