A lot of folks are using the new Kafka Direct Stream API in production.

And a lot of folks who used the old Kafka Receiver-based API are migrating
over.

The usual downside to "Experimental" features in Spark is that the API
might change, so you'll need to rewrite some code.

Stability-wise, the Spark codebase has a TON of tests around every new
feature - including experimental features.

>From experience, the Kafka Direct Stream API is very stable and a lot of
momentum is behind this implementation.

Check out my *pipeline* Github project to see this impl fully configured
and working within a Docker instance:

https://github.com/fluxcapacitor/pipeline/wiki

Here's a link to the kafka, kafka-rest-api, and kafka schema registry
configuration:  https://github.com/fluxcapacitor/pipeline/tree/master/config

And here's a link to a sample app that uses the Kafka Direct API and stores
data in Cassandra:

https://github.com/fluxcapacitor/pipeline/blob/master/myapps/streaming/src/main/scala/com/advancedspark/streaming/rating/store/Cassandra.scala

https://github.com/fluxcapacitor/pipeline/blob/master/myapps/streaming/start-streaming-ratings-cassandra.sh

Here's a link to the Docker image which contains the installation scripts
for Confluent's Kafka Distribution:

https://github.com/fluxcapacitor/pipeline/blob/master/Dockerfile

All code is in Scala.

On Wed, Dec 30, 2015 at 11:26 AM, David Newberger <
david.newber...@wandcorp.com> wrote:

> Hi All,
>
>
>
> I’ve been looking at the Direct Approach for streaming Kafka integration (
> http://spark.apache.org/docs/latest/streaming-kafka-integration.html)
> because it looks like a good fit for our use cases. My concern is the
> feature is experimental according to the documentation. Has anyone used
> this approach yet and if so what has you experience been with using it? If
> it helps we’d be looking to implement it using Scala. Secondly, in general
> what has people experience been with using experimental features in Spark?
>
>
>
> Cheers,
>
>
>
> David Newberger
>
>
>



-- 

*Chris Fregly*
Principal Data Solutions Engineer
IBM Spark Technology Center, San Francisco, CA
http://spark.tc | http://advancedspark.com

Reply via email to