Re: [ANN] Pallet-Hadoop (Hadoop clusters as data structures)

Sam Ritchie Fri, 24 Jun 2011 15:37:04 -0700

Hey all,

I've released an update to
pallet-hadoop<https://github.com/pallet/pallet-hadoop>(0.3.0) that
allows for spot
instance support <http://aws.amazon.com/ec2/spot-instances/>. The example
cluster below, when run on EC2, chooses large instances; A large spot
instance costs 12 cents an hour, down from the usual 34 cents for on
on-demand instance. (Current EMR pricing for large instances is up at 40
cents.)


To create a cluster of 20 cent spot instances is as easy as adding this
entry to the :base-machine-spec map:

:spot-price (float 0.20)

Future updates will focus on making it very easy to run
Cascalog<http://www.assembla.com/wiki/show/d9Z8_q-Omr35zteJe5cbLr>
jobs
(and other jar deployments) on these clusters. The goal of all of this is to
make Big Data analysis approachable, affordable and interactive. More to
come!

~Sam

On Wed, Jun 1, 2011 at 10:26 PM, Sam Ritchie <sritchi...@gmail.com> wrote:

> Hey all,
>
> I'd like to announce 
> Pallet-Hadoop<https://github.com/pallet/pallet-hadoop/tree/master>,
> a layer built on top of Pallet <https://github.com/pallet/pallet> that
> allows users to describe a Hadoop cluster configuration as a nested clojure
> map. Here's a cluster with one master node and two slave nodes with some
> custom properties, all 64 bit machines with at least 4 gigs of RAM, running
> Ubuntu 10.10:
>
>
> (def example-cluster
>   (cluster-spec :private
>                 {:jobtracker (node-group [:jobtracker :namenode])
>                  :slaves     (slave-group 2)}
>                 :base-machine-spec {:os-family :ubuntu
>                                     :os-version-matches "10.10"
>                                     :os-64-bit true
>                                     :min-ram (* 4 1024)}
>                 :base-props {:hdfs-site {:dfs.data.dir "/mnt/dfs/data"
>                                          :dfs.name.dir "/mnt/dfs/name"}
>                              :mapred-site {:mapred.task.timeout 300000
>                                            :mapred.reduce.tasks 3}}))
>
> Thanks to Pallet's flexibility and use of 
> jclouds<https://github.com/jclouds/jclouds>,
> the cluster description can be written without reference to any specific
> cloud provider, and can be used to boot machines on any of the major cloud
> providers <https://github.com/jclouds/jclouds#readme> (or on local virtual
> machines!) with a simple change of credentials.
>
> This example project <https://github.com/pallet/pallet-hadoop-example> 
> contains
> everything you need to get started; it walks through all steps necessary to
> boot a cluster and run the canonical word count example on Amazon's EC2
> platform. The project wiki <https://github.com/pallet/pallet-hadoop/wiki> 
> contains
> a lot more detail on the design and flexibility of the data structures
> involved.
>
> Future plans include intelligent default settings that adjust based on the
> specs of the cluster, and the ability to run 
> Cascalog<https://github.com/nathanmarz/cascalog> queries
> on these distributed clusters from Cake and Leiningen.
>
> I'd love to hear what you all think about this! Huge thanks to Hugo 
> Duncan<http://hugoduncan.org/>for getting this started, and to Toni
> Batchelli <http://tbatchelli.org/> for his excellent work on this project
> and its foundation, Pallet's new Hadoop 
> crate<https://github.com/pallet/pallet-apache-crates>
> .
>
> Cheers,
> Sam
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: [ANN] Pallet-Hadoop (Hadoop clusters as data structures)

Reply via email to