Let us assume that you want to build an integration test setup where you run all participating components in Docker.
You create a docker-compose.yml with four Docker images, something like this: # Start docker-compose.yml version: '2' services: myapp: build: myapp_dir links: - kafka - cassandra kafka: image: spotify/kafka environment: - ADVERTISED_HOST ports: - "2181:2181" - "9092:9092" cassandra: image: spotify/cassandra environment: - <might need some tweaking here> ports: - "9042:9042" test_harness: build: test_harness_dir links: - kafka - cassandra # End docker-compose.yml I haven't used the spotify/cassandra image, so you might need to do some environment variable plumbing to get it working. Your test harness would then push messages to Kafka, and poll Cassandra for the expected output. Your Spark Streaming application has Spark installed on the Docker image, and runs Spark with local master. You need to run this on a machine that has Docker and Docker Compose installed, typically a Ubuntu host. This machine can either be bare metal or a full VM (Virtualbox, VMware, Xen), which is what you get if you run in an IaaS cloud like GCE or EC2. Hence, your CI/CD Jenkins machine should be a dedicated instance. Developers with Macs would run docker-machine, which uses Virtualbox IIRC. Developers with Linux machines can run Docker and Docker Compose natively. You can in theory run Jenkins in Docker and spin up new Docker containers from inside Docker using some docker-inside-docker setup. It will add complexity, however, and I suspect it will be brittle, so I don't recommend it. You could also in theory use some cloud container service that runs your images during tests. They have different ways of welding Docker images than Docker Compose, however, so it also increases complexity and makes the CI/CD setup different than the setup on local developer machines. I went down this path once, but I cannot recommend it. If you instead want a setup where the test harness and your Spark Streaming application runs outside Docker, you omit them from docker-compose.yml, and have the test harness run docker-compose, and figure out the ports and addresses to connect to. As mentioned earlier, this requires more plumbing, but results in an integration test setup that runs smoothly from Gradle/Maven/SBT and also from IntelliJ. I hope things are clearer. Let me know if you have further questions. Regards, Lars Albertsson Data engineering consultant www.mapflat.com +46 70 7687109 Calendar: https://goo.gl/6FBtlS On Thu, Jul 7, 2016 at 3:14 AM, swetha kasireddy <swethakasire...@gmail.com> wrote: > Can this docker image be used to spin up kafka cluster in a CI/CD pipeline > like Jenkins to run the integration tests? Or it can be done only in the > local machine that has docker installed? I assume that the box where the > CI/CD pipeline runs should have docker installed correct? > > On Mon, Jul 4, 2016 at 5:20 AM, Lars Albertsson <la...@mapflat.com> wrote: >> >> I created such a setup for a client a few months ago. It is pretty >> straightforward, but it can take some work to get all the wires >> connected. >> >> I suggest that you start with the spotify/kafka >> (https://github.com/spotify/docker-kafka) Docker image, since it >> includes a bundled zookeeper. The alternative would be to spin up a >> separate Zookeeper Docker container and connect them, but for testing >> purposes, it would make the setup more complex. >> >> You'll need to inform Kafka about the external address it exposes by >> setting ADVERTISED_HOST to the output of "docker-machine ip" (on Mac) >> or the address printed by "ip addr show docker0" (Linux). I also >> suggest setting >> AUTO_CREATE_TOPICS to true. >> >> You can choose to run your Spark Streaming application under test >> (SUT) and your test harness also in Docker containers, or directly on >> your host. >> >> In the former case, it is easiest to set up a Docker Compose file >> linking the harness and SUT to Kafka. This variant provides better >> isolation, and might integrate better if you have existing similar >> test frameworks. >> >> If you want to run the harness and SUT outside Docker, I suggest that >> you build your harness with a standard test framework, e.g. scalatest >> or JUnit, and run both harness and SUT in the same JVM. In this case, >> you put code to bring up the Kafka Docker container in test framework >> setup methods. This test strategy integrates better with IDEs and >> build tools (mvn/sbt/gradle), since they will run (and debug) your >> tests without any special integration. I therefore prefer this >> strategy. >> >> >> What is the output of your application? If it is messages on a >> different Kafka topic, the test harness can merely subscribe and >> verify output. If you emit output to a database, you'll need another >> Docker container, integrated with Docker Compose. If you are emitting >> database entries, your test oracle will need to frequently poll the >> database for the expected records, with a timeout in order not to hang >> on failing tests. >> >> I hope this is comprehensible. Let me know if you have followup questions. >> >> Regards, >> >> >> >> Lars Albertsson >> Data engineering consultant >> www.mapflat.com >> +46 70 7687109 >> Calendar: https://goo.gl/6FBtlS >> >> >> >> On Thu, Jun 30, 2016 at 8:19 PM, SRK <swethakasire...@gmail.com> wrote: >> > Hi, >> > >> > I need to do integration tests using Spark Streaming. My idea is to spin >> > up >> > kafka using docker locally and use it to feed the stream to my Streaming >> > Job. Any suggestions on how to do this would be of great help. >> > >> > Thanks, >> > Swetha >> > >> > >> > >> > -- >> > View this message in context: >> > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-spin-up-Kafka-using-docker-and-use-for-Spark-Streaming-Integration-tests-tp27252.html >> > Sent from the Apache Spark User List mailing list archive at Nabble.com. >> > >> > --------------------------------------------------------------------- >> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> > > > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org