Re: Clojure interop with Spark

Jeff Stokes Thu, 09 Jul 2020 14:53:22 -0700

Hey Tim,

We at Amperity have used Sparkling for our Clojure Spark interop in the 
past. After a few years of fighting, we eventually ended up with sparkplug (
https://github.com/amperity/sparkplug), which we now use to run all of our 
production Spark jobs. There is built in support for proper function 
serialization including wrappers around the Java RDD APIs. We also have 
some basic support for REPL interaction, but this is fairly limited. We 
also run on a newer versions of Spark (2.4.4), and haven't had issues with 
the library when upgrading or changing Spark versions.


Let me know if I can help if you're interested!

-Jeff


On Thursday, July 9, 2020 at 2:36:41 PM UTC-7, Tim Clemons wrote:
>
> I'm putting together a big data system centered around using Spark 
> Streaming for data ingest and Spark SQL for querying the stored data.  I've 
> been investigating what options there are for implementing Spark 
> applications using Clojure.  It's been close to a decade since sparkling or 
> flambo have received any updates and it doesn't look like either will 
> accommodate recent distributions of Spark.  I've found powderkeg an 
> interesting option, and I like how it supports remote REPLs and the use of 
> tranducers rather than wrapped Scala fns.  However, it looks like it's also 
> seen a few years without commits and I've heard loose talk that the 
> developers have moved on to other pursuits.
>
> Part of the problem seems to be Spark.  The project seem unapologetic 
> about breaking interfaces and seems willing to sacrifice third-party code 
> that tries to track Spark's development.
>
> So my options seem to be the following:
>
> 1. Deploy an older version of Spark that's compatible with one of the 
> above mentioned libraries.  While we don't need to be bleeding edge, 
> deploying a three year old version just to accommodate my preferred 
> language is hard to justify.
>
> 2. Create a merge to update one of those libraries to more recent versions 
> of Spark and be prepared to maintain it internally for the lifespan of this 
> project.  This may be vastly overestimating my personal heroics.
>
> 3. Code my own solution from scratch using Java/Scala interop, sketching 
> out just enough of a Clojure wrapper to suit my ends.
>
> 4. Learn Scala.
>
> I realize that Spark isn't the only game in town (Onyx, for example).  
> However, I'm working with a team of developers who are not familiar with 
> Clojure (though I'm working to be an advocate). I choose Spark as an 
> established solution that supports multiple languages and handles both 
> streaming and batch processing.
>
> Any insights?  Any solutions I'm overlooking?
>
>
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clojure/48d98d5c-cf7f-4a63-a2ee-bf86dc2abfe8o%40googlegroups.com.

Re: Clojure interop with Spark

Reply via email to