Late reply...

Connecting from Zeppelin Docker image to an external Hadoop cluster works for me.

If you want to connect to Hadoop on your host laptop, you have to hack the IP adress.

More info on http://platform.datalayer.io/guide/latest/docker/zeppelin (read the "Spark in YARN mode" section)


On 06/08/16 09:05, DuyHai Doan wrote:
My host IP is 192.168.1.16
The VM IP is 10.0.2.15 (using Docker4Mac)

Indeed I tried many things:

1) using the host network (--net=host) but then I cannot access Zeppelin
(localhost:8080 or 127.0.0.1:8080 <http://127.0.0.1:8080>). Zeppelin
listens to the address 0.0.0.0 by default (config in zeppelin-site.xml).
Changing it to 127.0.0.1 does not help

2) using bridge network but configure port
     - run the container with -p 8080:8080 -p 8081:8081 -p 4040:4040
       this time I can access Zeppelin Web UI but then the Spark job hangs

     - edit the $ZEPPELIN_HOME/bin/interpreter.sh  to start Zeppelin in
"cluster" mode (--deploy-mode cluster) but it does not work

     - force the driver host to 192.168.1.16 and driver port to a fix
port (9991) instead of a random port --> the job is launched
successfully but when calling rdd.collect() the worker running on the
host machine cannot send back the result to the driver program inside
the container :

6/08/05 14:16:40 ERROR RetryingBlockFetcher: Exception while beginning
fetch of 1 outstanding blocks java.io.IOException: Failed to connect to
/10.0.2.15:42418 <http://10.0.2.15:42418> at
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:193)

  The driver program is still listening to the internal IP address on a
dynamic port ....

On Fri, Aug 5, 2016 at 11:41 PM, Trevor Grant <trevor.d.gr...@gmail.com
<mailto:trevor.d.gr...@gmail.com>> wrote:

    Was having some issues with this too.

    Did you try using the %sh to ping the machine with spark?  e.g.
    check for networking issue.

    Trevor Grant
    Data Scientist
    https://github.com/rawkintrevo
    http://stackexchange.com/users/3002022/rawkintrevo
    <http://stackexchange.com/users/3002022/rawkintrevo>
    http://trevorgrant.org

    /"Fortunate is he, who is able to know the causes of things."  -Virgil/


    On Fri, Aug 5, 2016 at 3:29 PM, Luciano Resende
    <luckbr1...@gmail.com <mailto:luckbr1...@gmail.com>> wrote:



        On Fri, Aug 5, 2016 at 11:18 PM, Luciano Resende
        <luckbr1...@gmail.com <mailto:luckbr1...@gmail.com>> wrote:

            Not exactly what you want, but I have an example here :
            https://github.com/lresende/docker-systemml-notebook
            <https://github.com/lresende/docker-systemml-notebook>

            You should be able to accomplish what you want playing with
            --link which I did in the example below (but just with Yarn
            and HDFS)
            https://github.com/lresende/docker-yarn-cluster
            <https://github.com/lresende/docker-yarn-cluster>


        BTW, you might have to use Levy to access the remote Spark.

        --
        Luciano Resende
        http://twitter.com/lresende1975 <http://twitter.com/lresende1975>
        http://lresende.blogspot.com/



Reply via email to