Thanks everyone. Very good discussion. Thanks Jacek, for the code snippet. I downloaded your Mastering Apache Spark pdf . I love it.
I have one more question, On Sat, Nov 12, 2016 at 2:21 PM, Sean McKibben <grap...@graphex.com> wrote: > I think one of the advantages of using akka-streams within Spark is the > fact that it is a general purpose stream processing toolset with > backpressure, not necessarily specific to kafka. If things work out with > the approach, Spark could be a great benefit to use as a coordination > framework for discrete streams processed on each executor. I've been toying > with the idea of making essentially an RDD of task messages, where each > task becomes an akka stream which are materialized on multiple executors > and completed as that executor's 'task', allowing Spark to coordinate the > completion of the entire job. For example, I might make an RDD which is > just a set of URLs that I want to download and produce to Kafka, but let's > say I have so many URLs that i need to coordinate that work across many > servers. Using Spark with a forEachPartition block, I might set up an > akka-stream to accomplish that task in a backpressured, stream-oriented > way, so that I could have the entire Spark job complete when all of the > URLs had been produced to Kafka, using individual Akka Streams within each > executor. > > I realize that this is not the original question on this thread, and I > don't meant to hijack that. I am also interested in the potential of Akka > Stream sources for a Spark Streaming job directly, which could potentially > be adapted for both Kafka and non-kafka use cases, with the emphasis for me > being on use cases which aren't necessarily Kafka specific. There are some > portions which feel like a bit of a mismatch, but with Structured Streams, > I think there is greater opportunity for some kind of symbiotic adapter > layer on the input side of things. I think the Apache Gearpump > <https://gearpump.apache.org/overview.html> project in incubation may > demonstrate how this adaptation can be approached, and the nascent Alpakka > project <https://github.com/akka/alpakka> is an example of the generic > applications of Akka Streams. > > It is important to note that Akka Streams are billed as a toolbox and not > a framework, because they don't handle coordination of parallelism or > multi-host concurrency. I think Spark could end up being a very convenient > framework to handle this aspect of of a distributed application's > architecture. It may be able to do some of this without any modification to > either of these projects, but I haven't had the experience of actually > attempting the implementation yet. > > > On Nov 12, 2016, at 9:42 AM, Jacek Laskowski <ja...@japila.pl> wrote: > > Hi Luciano, > > Mind sharing why to have a structured streaming source/sink for Akka > if Kafka's available and Akka Streams has a Kafka module? #curious > > Pozdrawiam, > Jacek Laskowski > ---- > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Sat, Nov 12, 2016 at 4:07 PM, Luciano Resende <luckbr1...@gmail.com> > wrote: > > If you are interested in Akka streaming, it is being maintained in Apache > Bahir. For Akka there isn't a structured streaming version yet, but we > would > be interested in collaborating in the structured streaming version for > sure. > > On Thu, Nov 10, 2016 at 8:46 AM shyla deshpande <deshpandesh...@gmail.com> > wrote: > > > I am using Spark 2.0.1. I wanted to build a data pipeline using Kafka, > Spark Streaming and Cassandra using Structured Streaming. But the kafka > source support for Structured Streaming is not yet available. So now I am > trying to use Akka Stream as the source to Spark Streaming. > > Want to make sure I am heading in the right direction. Please direct me to > any sample code and reading material for this. > > Thanks > > -- > Sent from my Mobile device > > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > >