A Clojure api for the Spark Project.  I am aware that there is another
clojure spark wrapper project which looks very interesting,  This project
has similar goals.  And also similar to that project it is not absolutely
complete, but it is does have some documentation and examples.  And it is
useable and should be easy enough to extend as needed.  This is the result
of about three weeks of work.  It handles many of the initial problems like
serializing anonymous functions, converting back and forth between Scala
Tuples and Clojure seqs, and converting RDDs to PairRDDs.

The project is available here:

https://github.com/TheClimateCorporation/clj-spark

Thanks to The Climate Corporation for allowing me to release it.  At
Climate, we do the majority of our Big Data work with Cascalog (on top of
Cascading).  I was looking into Spark for some of the benefits that it
provides.  I suspect we will explore Shark next, and may work it in to our
processes for some of our more adhoc/exploratory queries.

I think it would be interesting to see a Cascading planner on top of Spark,
which would enable Cascalog queries (mostly) for free.  I suspect that
might be a superior method of using Clojure on Spark.

Marc Limotte

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to