I created such a setup for a client a few months ago. It is pretty straightforward, but it can take some work to get all the wires connected.
I suggest that you start with the spotify/kafka (https://github.com/spotify/docker-kafka) Docker image, since it includes a bundled zookeeper. The alternative would be to spin up a separate Zookeeper Docker container and connect them, but for testing purposes, it would make the setup more complex. You'll need to inform Kafka about the external address it exposes by setting ADVERTISED_HOST to the output of "docker-machine ip" (on Mac) or the address printed by "ip addr show docker0" (Linux). I also suggest setting AUTO_CREATE_TOPICS to true. You can choose to run your Spark Streaming application under test (SUT) and your test harness also in Docker containers, or directly on your host. In the former case, it is easiest to set up a Docker Compose file linking the harness and SUT to Kafka. This variant provides better isolation, and might integrate better if you have existing similar test frameworks. If you want to run the harness and SUT outside Docker, I suggest that you build your harness with a standard test framework, e.g. scalatest or JUnit, and run both harness and SUT in the same JVM. In this case, you put code to bring up the Kafka Docker container in test framework setup methods. This test strategy integrates better with IDEs and build tools (mvn/sbt/gradle), since they will run (and debug) your tests without any special integration. I therefore prefer this strategy. What is the output of your application? If it is messages on a different Kafka topic, the test harness can merely subscribe and verify output. If you emit output to a database, you'll need another Docker container, integrated with Docker Compose. If you are emitting database entries, your test oracle will need to frequently poll the database for the expected records, with a timeout in order not to hang on failing tests. I hope this is comprehensible. Let me know if you have followup questions. Regards, Lars Albertsson Data engineering consultant www.mapflat.com +46 70 7687109 Calendar: https://goo.gl/6FBtlS On Thu, Jun 30, 2016 at 8:19 PM, SRK <swethakasire...@gmail.com> wrote: > Hi, > > I need to do integration tests using Spark Streaming. My idea is to spin up > kafka using docker locally and use it to feed the stream to my Streaming > Job. Any suggestions on how to do this would be of great help. > > Thanks, > Swetha > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-spin-up-Kafka-using-docker-and-use-for-Spark-Streaming-Integration-tests-tp27252.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org