Hi Mark. This was very much exploratory work, and a lot of it was just about learning the Spark paradigms. That being said, merging for future work seems appropriate, but it's not clear yet if I will be pursuing this work further. Might wind up using Shark instead [would love to use Cascading over Spark as well, if it existed].
I'd like to know what issue you had in getting the code/examples to work? I had a couple of people try this out from scratch on clean systems and it did work for them. Serializing the functions is necessary as far as I can tell. It would not work for me without this. As far as I can tell (this is largely guesswork), the problem is that each time the anonymous function is evaluated on a different JVM it gets a different class name (e.g. fn_123). There is high likelihood that the name assigned on the master is not the same as the name on the task JVMs, so you wind up with a ClassNotFoundException. I don't know why this would work for you. If you have any insight on this, I would love to hear it? Marc On Tue, Jan 22, 2013 at 8:09 AM, Mark Hamstra <markhams...@gmail.com> wrote: > Hmmm... a lot of duplicated work. Sorry I didn't get my stuff in a more > usable form for you, but I wasn't aware that anybody was even interested in > it. I've got some stuff that I want to rework a little, and I'm still > thinking through the best way to integrate with the new reducers code in > Clojure, but I haven't had the right combination of time and motivation to > finish off what I started and document it. At any rate, we should work at > merging the two efforts, since I don't see any need for duplicate APIs. > > In taking a quick first pass at it, I wasn't able to get your code and > examples to work, but I'm curious what your reasoning is for > using serializable.fn and avoiding use of > clojure.core/fn or #(). I'm not sure that is strictly necessary. For > example, the following works just fine with my API: > > (require 'spark.api.clojure.core) > > (wrappers!) ; one of the pieces I want to re-work, but allows functions > like map to work with either Clojure collections or RDDs > > (set-spark-context! "local[4]" "cljspark") > > (def rdd (parallelize [1 2 3 4])) > > (def mrdd1 (map #(+ 2 %) rdd)) > > (def result1 (collect mrdd1)) > > (def offset1 4) > > (def mrdd2 (map #(+ offset %) rdd)) > > (def result2 (collect mrdd2)) > > (def mrdd3 (map (let [offset2 5] (+ offset %)) rdd)) > > (def result3 (collect mrdd3)) > > > That will result in result1, result2, and result3 being [3 4 5 6], [5 6 7 > 8], and [6 7 8 9] respectively, without any need for serializable-fn. > > > On Tuesday, January 22, 2013 6:55:53 AM UTC-8, Marc Limotte wrote: > >> A Clojure api for the Spark Project. I am aware that there is another >> clojure spark wrapper project which looks very interesting, This project >> has similar goals. And also similar to that project it is not absolutely >> complete, but it is does have some documentation and examples. And it is >> useable and should be easy enough to extend as needed. This is the result >> of about three weeks of work. It handles many of the initial problems like >> serializing anonymous functions, converting back and forth between Scala >> Tuples and Clojure seqs, and converting RDDs to PairRDDs. >> >> The project is available here: >> >> https://github.com/**TheClimateCorporation/clj-**spark<https://github.com/TheClimateCorporation/clj-spark> >> >> Thanks to The Climate Corporation for allowing me to release it. At >> Climate, we do the majority of our Big Data work with Cascalog (on top of >> Cascading). I was looking into Spark for some of the benefits that it >> provides. I suspect we will explore Shark next, and may work it in to our >> processes for some of our more adhoc/exploratory queries. >> >> I think it would be interesting to see a Cascading planner on top of >> Spark, which would enable Cascalog queries (mostly) for free. I suspect >> that might be a superior method of using Clojure on Spark. >> >> Marc Limotte >> >> -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en