Hey Tim, We at Amperity have used Sparkling for our Clojure Spark interop in the past. After a few years of fighting, we eventually ended up with sparkplug ( https://github.com/amperity/sparkplug), which we now use to run all of our production Spark jobs. There is built in support for proper function serialization including wrappers around the Java RDD APIs. We also have some basic support for REPL interaction, but this is fairly limited. We also run on a newer versions of Spark (2.4.4), and haven't had issues with the library when upgrading or changing Spark versions.
Let me know if I can help if you're interested! -Jeff On Thursday, July 9, 2020 at 2:36:41 PM UTC-7, Tim Clemons wrote: > > I'm putting together a big data system centered around using Spark > Streaming for data ingest and Spark SQL for querying the stored data. I've > been investigating what options there are for implementing Spark > applications using Clojure. It's been close to a decade since sparkling or > flambo have received any updates and it doesn't look like either will > accommodate recent distributions of Spark. I've found powderkeg an > interesting option, and I like how it supports remote REPLs and the use of > tranducers rather than wrapped Scala fns. However, it looks like it's also > seen a few years without commits and I've heard loose talk that the > developers have moved on to other pursuits. > > Part of the problem seems to be Spark. The project seem unapologetic > about breaking interfaces and seems willing to sacrifice third-party code > that tries to track Spark's development. > > So my options seem to be the following: > > 1. Deploy an older version of Spark that's compatible with one of the > above mentioned libraries. While we don't need to be bleeding edge, > deploying a three year old version just to accommodate my preferred > language is hard to justify. > > 2. Create a merge to update one of those libraries to more recent versions > of Spark and be prepared to maintain it internally for the lifespan of this > project. This may be vastly overestimating my personal heroics. > > 3. Code my own solution from scratch using Java/Scala interop, sketching > out just enough of a Clojure wrapper to suit my ends. > > 4. Learn Scala. > > I realize that Spark isn't the only game in town (Onyx, for example). > However, I'm working with a team of developers who are not familiar with > Clojure (though I'm working to be an advocate). I choose Spark as an > established solution that supports multiple languages and handles both > streaming and batch processing. > > Any insights? Any solutions I'm overlooking? > > > > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/clojure/48d98d5c-cf7f-4a63-a2ee-bf86dc2abfe8o%40googlegroups.com.