A lot of folks are using the new Kafka Direct Stream API in production. And a lot of folks who used the old Kafka Receiver-based API are migrating over.
The usual downside to "Experimental" features in Spark is that the API might change, so you'll need to rewrite some code. Stability-wise, the Spark codebase has a TON of tests around every new feature - including experimental features. >From experience, the Kafka Direct Stream API is very stable and a lot of momentum is behind this implementation. Check out my *pipeline* Github project to see this impl fully configured and working within a Docker instance: https://github.com/fluxcapacitor/pipeline/wiki Here's a link to the kafka, kafka-rest-api, and kafka schema registry configuration: https://github.com/fluxcapacitor/pipeline/tree/master/config And here's a link to a sample app that uses the Kafka Direct API and stores data in Cassandra: https://github.com/fluxcapacitor/pipeline/blob/master/myapps/streaming/src/main/scala/com/advancedspark/streaming/rating/store/Cassandra.scala https://github.com/fluxcapacitor/pipeline/blob/master/myapps/streaming/start-streaming-ratings-cassandra.sh Here's a link to the Docker image which contains the installation scripts for Confluent's Kafka Distribution: https://github.com/fluxcapacitor/pipeline/blob/master/Dockerfile All code is in Scala. On Wed, Dec 30, 2015 at 11:26 AM, David Newberger < david.newber...@wandcorp.com> wrote: > Hi All, > > > > I’ve been looking at the Direct Approach for streaming Kafka integration ( > http://spark.apache.org/docs/latest/streaming-kafka-integration.html) > because it looks like a good fit for our use cases. My concern is the > feature is experimental according to the documentation. Has anyone used > this approach yet and if so what has you experience been with using it? If > it helps we’d be looking to implement it using Scala. Secondly, in general > what has people experience been with using experimental features in Spark? > > > > Cheers, > > > > David Newberger > > > -- *Chris Fregly* Principal Data Solutions Engineer IBM Spark Technology Center, San Francisco, CA http://spark.tc | http://advancedspark.com