Thanks everyone. Very good discussion.

Thanks Jacek, for the code snippet. I downloaded your Mastering Apache
Spark pdf . I love it.

I have one more question,


On Sat, Nov 12, 2016 at 2:21 PM, Sean McKibben <grap...@graphex.com> wrote:

> I think one of the advantages of using akka-streams within Spark is the
> fact that it is a general purpose stream processing toolset with
> backpressure, not necessarily specific to kafka. If things work out with
> the approach, Spark could be a great benefit to use as a coordination
> framework for discrete streams processed on each executor. I've been toying
> with the idea of making essentially an RDD of task messages, where each
> task becomes an akka stream which are materialized on multiple executors
> and completed as that executor's 'task', allowing Spark to coordinate the
> completion of the entire job. For example, I might make an RDD which is
> just a set of URLs that I want to download and produce to Kafka, but let's
> say I have so many URLs that i need to coordinate that work across many
> servers. Using Spark with a forEachPartition block, I might set up an
> akka-stream to accomplish that task in a backpressured, stream-oriented
> way, so that I could have the entire Spark job complete when all of the
> URLs had been produced to Kafka, using individual Akka Streams within each
> executor.
>
> I realize that this is not the original question on this thread, and I
> don't meant to hijack that. I am also interested in the potential of Akka
> Stream sources for a Spark Streaming job directly, which could potentially
> be adapted for both Kafka and non-kafka use cases, with the emphasis for me
> being on use cases which aren't necessarily Kafka specific. There are some
> portions which feel like a bit of a mismatch, but with Structured Streams,
> I think there is greater opportunity for some kind of symbiotic adapter
> layer on the input side of things. I think the Apache Gearpump
> <https://gearpump.apache.org/overview.html> project in incubation may
> demonstrate how this adaptation can be approached, and the nascent Alpakka
> project <https://github.com/akka/alpakka> is an example of the generic
> applications of Akka Streams.
>
> It is important to note that Akka Streams are billed as a toolbox and not
> a framework, because they don't handle coordination of parallelism or
> multi-host concurrency. I think Spark could end up being a very convenient
> framework to handle this aspect of of a distributed application's
> architecture. It may be able to do some of this without any modification to
> either of these projects, but I haven't had the experience of actually
> attempting the implementation yet.
>
>
> On Nov 12, 2016, at 9:42 AM, Jacek Laskowski <ja...@japila.pl> wrote:
>
> Hi Luciano,
>
> Mind sharing why to have a structured streaming source/sink for Akka
> if Kafka's available and Akka Streams has a Kafka module? #curious
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Sat, Nov 12, 2016 at 4:07 PM, Luciano Resende <luckbr1...@gmail.com>
> wrote:
>
> If you are interested in Akka streaming, it is being maintained in Apache
> Bahir. For Akka there isn't a structured streaming version yet, but we
> would
> be interested in collaborating in the structured streaming version for
> sure.
>
> On Thu, Nov 10, 2016 at 8:46 AM shyla deshpande <deshpandesh...@gmail.com>
> wrote:
>
>
> I am using Spark 2.0.1. I wanted to build a data pipeline using Kafka,
> Spark Streaming and Cassandra using Structured Streaming. But the kafka
> source support for Structured Streaming is not yet available. So now I am
> trying to use Akka Stream as the source to Spark Streaming.
>
> Want to make sure I am heading in the right direction. Please direct me to
> any sample code and reading material for this.
>
> Thanks
>
> --
> Sent from my Mobile device
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
>

Reply via email to