ay one thing : "a nightmare". ;-)
> >
> > Let's see if things are better with 1.3.0 :
> > http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html
> >
> > On Fri, Mar 13, 2015 at 8:33 PM, William Briggs
> wrote:
> >
> >> Spark
y using spark streaming 1.2.1 with kafka and write-ahead
> log.
> I will only say one thing : "a nightmare". ;-)
>
> Let's see if things are better with 1.3.0 :
> http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html
>
> On Fri, Mar 13, 2015 at 8:33 PM, W
Spark Streaming also has built-in support for Kafka, and as of Spark 1.2,
it supports using an HDFS write-ahead log to ensure zero data loss while
streaming:
https://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance-and-zero-data-loss-in-spark-streaming.html
-Will
On Fri, Mar 13, 201
I would think that this is not a particularly great solution, as you will
end up running into quite a few edge cases, and I can't see this scaling
particularly well - how do you know which server to copy logs from in a
clustered and replicated environment? What happens when Kafka detects a
failure
Manually managing data locality will become difficult to scale. Kafka is
one potential tool you can use to help scale, but by itself, it will not
solve your problem. If you need the data in near-real time, you could use a
technology like Spark or Storm to stream data from Kafka and perform your
pro