A Clojure api for the Spark Project. I am aware that there is another clojure spark wrapper project which looks very interesting, This project has similar goals. And also similar to that project it is not absolutely complete, but it is does have some documentation and examples. And it is useable and should be easy enough to extend as needed. This is the result of about three weeks of work. It handles many of the initial problems like serializing anonymous functions, converting back and forth between Scala Tuples and Clojure seqs, and converting RDDs to PairRDDs.
The project is available here: https://github.com/TheClimateCorporation/clj-spark Thanks to The Climate Corporation for allowing me to release it. At Climate, we do the majority of our Big Data work with Cascalog (on top of Cascading). I was looking into Spark for some of the benefits that it provides. I suspect we will explore Shark next, and may work it in to our processes for some of our more adhoc/exploratory queries. I think it would be interesting to see a Cascading planner on top of Spark, which would enable Cascalog queries (mostly) for free. I suspect that might be a superior method of using Clojure on Spark. Marc Limotte -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en