Hi Jacob, Thanks for the help & answer on the docker question. Have you already experimented with the new link feature in Docker? That does not help the HDFS issue as the DataNode needs the namenode and vice-versa but it does facilitate simpler client-server interactions.
My issue described at the beginning is related to networking between the host and the docker images, but I was loosing too much time tracking down the exact problem, so I moved my Spark job driver into the mesos node and it started working. Sadly, my Mesos UI is partially crippled as workers are not addressable (therefore spark job logs are hard to gather) Your discussion about dynamic port allocation is very relevant to understand why some components cannot talk with each other. I'll need to have a more in-depth read of that discussion to find a better solution for my local development environment. regards, Gerard. On Tue, May 6, 2014 at 3:30 PM, Jacob Eisinger <jeis...@us.ibm.com> wrote: > Howdy, > > You might find the discussion Andrew and I have been having about Docker > and network security [1] applicable. > > Also, I posted an answer [2] to your stackoverflow question. > > [1] > http://apache-spark-user-list.1001560.n3.nabble.com/spark-shell-driver-interacting-with-Workers-in-YARN-mode-firewall-blocking-communication-tp5237p5441.html > [2] > http://stackoverflow.com/questions/23410505/how-to-run-hdfs-cluster-without-dns/23495100#23495100 > > Jacob D. Eisinger > IBM Emerging Technologies > jeis...@us.ibm.com - (512) 286-6075 > > [image: Inactive hide details for Gerard Maas ---05/05/2014 04:18:08 > PM---Hi Benjamin, Yes, we initially used a modified version of the]Gerard > Maas ---05/05/2014 04:18:08 PM---Hi Benjamin, Yes, we initially used a > modified version of the AmpLabs docker scripts > > From: Gerard Maas <gerard.m...@gmail.com> > To: user@spark.apache.org > Date: 05/05/2014 04:18 PM > Subject: Re: Local Dev Env with Mesos + Spark Streaming on Docker: Can't > submit jobs. > ------------------------------ > > > > Hi Benjamin, > > Yes, we initially used a modified version of the AmpLabs docker scripts > [1]. The amplab docker images are a good starting point. > One of the biggest hurdles has been HDFS, which requires reverse-DNS and I > didn't want to go the dnsmasq route to keep the containers relatively > simple to use without the need of external scripts. Ended up running a > 1-node setup nnode+dnode. I'm still looking for a better solution for HDFS > [2] > > Our usecase using docker is to easily create local dev environments both > for development and for automated functional testing (using cucumber). My > aim is to strongly reduce the time of the develop-deploy-test cycle. > That also means that we run the minimum number of instances required to > have a functionally working setup. E.g. 1 Zookeeper, 1 Kafka broker, ... > > For the actual cluster deployment we have Chef-based devops toolchain that > put things in place on public cloud providers. > Personally, I think Docker rocks and would like to replace those complex > cookbooks with Dockerfiles once the technology is mature enough. > > -greetz, Gerard. > > [1] > *https://github.com/amplab/docker-scripts*<https://github.com/amplab/docker-scripts> > > [2] > *http://stackoverflow.com/questions/23410505/how-to-run-hdfs-cluster-without-dns*<http://stackoverflow.com/questions/23410505/how-to-run-hdfs-cluster-without-dns> > > > On Mon, May 5, 2014 at 11:00 PM, Benjamin > <*bboui...@gmail.com*<bboui...@gmail.com>> > wrote: > > Hi, > > Before considering running on Mesos, did you try to submit the > application on Spark deployed without Mesos on Docker containers ? > > Currently investigating this idea to deploy quickly a complete set of > clusters with Docker, I'm interested by your findings on sharing the > settings of Kafka and Zookeeper across nodes. How many broker and zookeeper > do you use ? > > Regards, > > > > On Mon, May 5, 2014 at 10:11 PM, Gerard Maas > <*gerard.m...@gmail.com*<gerard.m...@gmail.com>> > wrote: > Hi all, > > I'm currently working on creating a set of docker images to > facilitate local development with Spark/streaming on Mesos (+zk, hdfs, > kafka) > > After solving the initial hurdles to get things working together in > docker containers, now everything seems to start-up correctly and the > mesos > UI shows slaves as they are started. > > I'm trying to submit a job from IntelliJ and the jobs submissions > seem to get lost in Mesos translation. The logs are not helping me to > figure out what's wrong, so I'm posting them here in the hope that they > can > ring a bell and somebdoy could provide me a hint on what's wrong/missing > with my setup. > > > ---- DRIVER (IntelliJ running a Job.scala main) ---- > 14/05/05 21:52:31 INFO MetadataCleaner: Ran metadata cleaner for > SHUFFLE_BLOCK_MANAGER > 14/05/05 21:52:31 INFO BlockManager: Dropping broadcast blocks > older than 1399319251962 > 14/05/05 21:52:31 INFO BlockManager: Dropping non broadcast blocks > older than 1399319251962 > 14/05/05 21:52:31 INFO MetadataCleaner: Ran metadata cleaner for > BROADCAST_VARS > 14/05/05 21:52:31 INFO MetadataCleaner: Ran metadata cleaner for > BLOCK_MANAGER > 14/05/05 21:52:32 INFO MetadataCleaner: Ran metadata cleaner for > HTTP_BROADCAST > 14/05/05 21:52:32 INFO MetadataCleaner: Ran metadata cleaner for > MAP_OUTPUT_TRACKER > 14/05/05 21:52:32 INFO MetadataCleaner: Ran metadata cleaner for > SPARK_CONTEXT > > > ---- MESOS MASTER ---- > I0505 19:52:39.718080 388 master.cpp:690] Registering framework > 201405051517-67113388-5050-383-6995 at scheduler(1)@ > *127.0.1.1:58115* <http://127.0.1.1:58115/> > I0505 19:52:39.718261 388 master.cpp:493] Framework > 201405051517-67113388-5050-383-6995 disconnected > I0505 19:52:39.718277 389 hierarchical_allocator_process.hpp:332] > Added framework 201405051517-67113388-5050-383-6995 > I0505 19:52:39.718312 388 master.cpp:520] Giving framework > 201405051517-67113388-5050-383-6995 0ns to failover > I0505 19:52:39.718431 389 hierarchical_allocator_process.hpp:408] > Deactivated framework 201405051517-67113388-5050-383-6995 > W0505 19:52:39.718459 388 master.cpp:1388] Master returning > resources offered to framework 201405051517-67113388-5050-383-6995 > because > the framework has terminated or is inactive > I0505 19:52:39.718567 388 master.cpp:1376] Framework failover > timeout, removing framework 201405051517-67113388-5050-383-6995 > > > > ---- MESOS SLAVE ---- > I0505 19:49:27.662019 20 slave.cpp:1191] Asked to shut down > framework 201405051517-67113388-5050-383-6803 by > *master@172.17.0.4:5050* <http://master@172.17.0.4:5050/> > W0505 19:49:27.662072 20 slave.cpp:1206] Cannot shut down > unknown framework 201405051517-67113388-5050-383-6803 > I0505 19:49:28.662153 18 slave.cpp:1191] Asked to shut down > framework 201405051517-67113388-5050-383-6804 by > *master@172.17.0.4:5050* <http://master@172.17.0.4:5050/> > W0505 19:49:28.662212 18 slave.cpp:1206] Cannot shut down > unknown framework 201405051517-67113388-5050-383-6804 > I0505 19:49:29.662199 13 slave.cpp:1191] Asked to shut down > framework 201405051517-67113388-5050-383-6805 by > *master@172.17.0.4:5050* <http://master@172.17.0.4:5050/> > W0505 19:49:29.662256 13 slave.cpp:1206] Cannot shut down > unknown framework 201405051517-67113388-5050-383-6805 > I0505 19:49:30.662443 16 slave.cpp:1191] Asked to shut down > framework 201405051517-67113388-5050-383-6806 by > *master@172.17.0.4:5050* <http://master@172.17.0.4:5050/> > W0505 19:49:30.662489 16 slave.cpp:1206] Cannot shut down > unknown framework 201405051517-67113388-5050-383-6806 > > > Thanks in advance, > > Gerard. > > > > -- > Benjamin Bouillé > *+33 665 050 285* <%2B33%20665%20050%20285> > > >