Connect Dockerized Zeppelin to dev Cloudera 6.3.1 cluster [Promise timeout when Registering interpreter process]

2021-04-01 Thread Theo Diefenthal
Hi there, 

I want to achieve the following usecase: Start Zeppelin 0.9.0 (in docker) on my 
local dev machine but let the Spark jobs in the notebook run on a remote 
cluster via YARN. 

For a few hours already, I try to setup that environment with my companies 
Cloudera CDH 6.3.1 development cluster. That cluster is unsecured (despite that 
it can only be reached when connected to VPN). With a lot of trial and error I 
finally achieved a successful connection from my dockerized Zeppelin to the 
cluster. This means that when I start running a spark cell in Zeppelin, I can 
see a new application in YARN on the cluster-side [named spark-shared_process] 
. However, eventually the execution of the cell will fail with the following 
stack trace in the yarn application [1]. I have no idea where this timeout 
could potentially come from and I'd be happy if you could help me out here. In 
the said VPN to the dev cluster, there are no connection restrictions like 
firewalls or stuff like that engaged. The cell I run is the first one in "3. 
Spark SQL (Scala)" Zeppelin quick start notebooks with title "Create 
Dataset/DataFrame via SparkSession". 

For reference, I also attach my docker-compose file [2] and my Dockerfile for 
building Zeppelin with Spark and Hadoop [3] (Note that I add hadoop conf files 
into the image because I'd like to distribute the image as ready-to-run for the 
other people in my project without needing them to copy over the hadoop conf 
files). After start of the container, I further change the interpreter settings 
by setting yarn-cluster in %spark interpreter settings and also set 
zeppelin.interpreter.connect.timeout to 600.000. 

Best regards 
Theo 

PS: HDFS in general seems to work well. [4] 
PPS: I also attach the docker container logs from an attempt [5] 



[1] 
INFO [2021-04-01 23:48:20,984] ({main} Logging.scala[logInfo]:54) - Registered 
signal handler for TERM 
INFO [2021-04-01 23:48:21,005] ({main} Logging.scala[logInfo]:54) - Registered 
signal handler for HUP 
INFO [2021-04-01 23:48:21,014] ({main} Logging.scala[logInfo]:54) - Registered 
signal handler for INT 
INFO [2021-04-01 23:48:22,158] ({main} Logging.scala[logInfo]:54) - Changing 
view acls to: yarn,sandbox 
INFO [2021-04-01 23:48:22,160] ({main} Logging.scala[logInfo]:54) - Changing 
modify acls to: yarn,sandbox 
INFO [2021-04-01 23:48:22,161] ({main} Logging.scala[logInfo]:54) - Changing 
view acls groups to: 
INFO [2021-04-01 23:48:22,162] ({main} Logging.scala[logInfo]:54) - Changing 
modify acls groups to: 
INFO [2021-04-01 23:48:22,168] ({main} Logging.scala[logInfo]:54) - 
SecurityManager: authentication disabled; ui acls disabled; users with view 
permissions: Set(yarn, sandbox); groups with view permissions: Set(); users 
with modify permissions: Set(yarn, sandbox); groups with modify permissions: 
Set() 
INFO [2021-04-01 23:48:25,388] ({main} Logging.scala[logInfo]:54) - Preparing 
Local resources 
WARN [2021-04-01 23:48:28,111] ({main} NativeCodeLoader.java[]:62) - 
Unable to load native-hadoop library for your platform... using builtin-java 
classes where applicable 
INFO [2021-04-01 23:48:29,004] ({main} Logging.scala[logInfo]:54) - 
ApplicationAttemptId: appattempt_1617228950227_5781_01 
INFO [2021-04-01 23:48:29,041] ({main} Logging.scala[logInfo]:54) - Starting 
the user application in a separate Thread 
INFO [2021-04-01 23:48:29,289] ({main} Logging.scala[logInfo]:54) - Waiting for 
spark context initialization... 
INFO [2021-04-01 23:48:30,007] ({RegisterThread} 
RemoteInterpreterServer.java[run]:595) - Start registration 
INFO [2021-04-01 23:48:30,009] ({RemoteInterpreterServer-Thread} 
RemoteInterpreterServer.java[run]:193) - Launching ThriftServer at 
99.99.99.99:44802 
INFO [2021-04-01 23:48:31,276] ({RegisterThread} 
RemoteInterpreterServer.java[run]:609) - Registering interpreter process 
ERROR [2021-04-01 23:50:09,531] ({main} Logging.scala[logError]:91) - Uncaught 
exception: 
java.util.concurrent.TimeoutException: Futures timed out after [10 
milliseconds] 
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223) 
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227) 
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220) 
at 
org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:469)
 
at 
org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
 
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
 
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
 
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
 
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:780)
 
at java.security.AccessController.doPrivileged(Native Method) 
at javax.security.a

Re: Connect Dockerized Zeppelin to dev Cloudera 6.3.1 cluster [Promise timeout when Registering interpreter process]

2021-04-01 Thread Jeff Zhang
Most likely it is due to network issue, the connection between spark driver
(run in yarn container in yarn cluster mode) and zeppelin server is
bidirectional.  It looks like it is due to your spark driver unable to
connect to the zeppelin server.

Theo Diefenthal  于2021年4月2日周五 上午7:48写道:

> Hi there,
>
> I want to achieve the following usecase: Start Zeppelin 0.9.0 (in docker)
> on my local dev machine but let the Spark jobs in the notebook run on a
> remote cluster via YARN.
>
> For a few hours already, I try to setup that environment with my companies
> Cloudera CDH 6.3.1 development cluster. That cluster is unsecured (despite
> that it can only be reached when connected to VPN). With a lot of trial and
> error I finally achieved a successful connection from my dockerized
> Zeppelin to the cluster. This means that when I start running a spark cell
> in Zeppelin, I can see a new application in YARN on the cluster-side [named
> spark-shared_process]. However, eventually the execution of the cell will
> fail with the following stack trace in the yarn application [1]. I have no
> idea where this timeout could potentially come from and I'd be happy if you
> could help me out here. In the said VPN to the dev cluster, there are no
> connection restrictions like firewalls or stuff like that engaged. The cell
> I run is the first one in "3. Spark SQL (Scala)" Zeppelin quick start
> notebooks with title "Create Dataset/DataFrame via SparkSession".
>
> For reference, I also attach my docker-compose file [2] and my Dockerfile
> for building Zeppelin with Spark and Hadoop [3] (Note that I add hadoop
> conf files into the image because I'd like to distribute the image as
> ready-to-run for the other people in my project without needing them to
> copy over the hadoop conf files). After start of the container, I further
> change the interpreter settings by setting yarn-cluster in %spark
> interpreter settings and also set zeppelin.interpreter.connect.timeout to
> 600.000.
>
> Best regards
> Theo
>
> PS: HDFS in general seems to work well. [4]
> PPS: I also attach the docker container logs from an attempt [5]
>
>
>
> [1]
> INFO [2021-04-01 23:48:20,984] ({main} Logging.scala[logInfo]:54) -
> Registered signal handler for TERM
>  INFO [2021-04-01 23:48:21,005] ({main} Logging.scala[logInfo]:54) -
> Registered signal handler for HUP
>  INFO [2021-04-01 23:48:21,014] ({main} Logging.scala[logInfo]:54) -
> Registered signal handler for INT
>  INFO [2021-04-01 23:48:22,158] ({main} Logging.scala[logInfo]:54) -
> Changing view acls to: yarn,sandbox
>  INFO [2021-04-01 23:48:22,160] ({main} Logging.scala[logInfo]:54) -
> Changing modify acls to: yarn,sandbox
>  INFO [2021-04-01 23:48:22,161] ({main} Logging.scala[logInfo]:54) -
> Changing view acls groups to:
>  INFO [2021-04-01 23:48:22,162] ({main} Logging.scala[logInfo]:54) -
> Changing modify acls groups to:
>  INFO [2021-04-01 23:48:22,168] ({main} Logging.scala[logInfo]:54) -
> SecurityManager: authentication disabled; ui acls disabled; users  with
> view permissions: Set(yarn, sandbox); groups with view permissions: Set();
> users  with modify permissions: Set(yarn, sandbox); groups with modify
> permissions: Set()
>  INFO [2021-04-01 23:48:25,388] ({main} Logging.scala[logInfo]:54) -
> Preparing Local resources
>  WARN [2021-04-01 23:48:28,111] ({main}
> NativeCodeLoader.java[]:62) - Unable to load native-hadoop library
> for your platform... using builtin-java classes where applicable
>  INFO [2021-04-01 23:48:29,004] ({main} Logging.scala[logInfo]:54) -
> ApplicationAttemptId: appattempt_1617228950227_5781_01
>  INFO [2021-04-01 23:48:29,041] ({main} Logging.scala[logInfo]:54) -
> Starting the user application in a separate Thread
>  INFO [2021-04-01 23:48:29,289] ({main} Logging.scala[logInfo]:54) -
> Waiting for spark context initialization...
>  INFO [2021-04-01 23:48:30,007] ({RegisterThread}
> RemoteInterpreterServer.java[run]:595) - Start registration
>  INFO [2021-04-01 23:48:30,009] ({RemoteInterpreterServer-Thread}
> RemoteInterpreterServer.java[run]:193) - Launching ThriftServer at
> 99.99.99.99:44802
>  INFO [2021-04-01 23:48:31,276] ({RegisterThread}
> RemoteInterpreterServer.java[run]:609) - Registering interpreter process
> ERROR [2021-04-01 23:50:09,531] ({main} Logging.scala[logError]:91) -
> Uncaught exception:
> java.util.concurrent.TimeoutException: Futures timed out after [10
> milliseconds]
> at
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
> at
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
> at
> org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:469)
> at org.apache.spark.deploy.yarn.ApplicationMaster.org
> $apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.ap

Re: Zeppelin at scale

2021-04-01 Thread Jeff Zhang
I don't think anyone has that scale of usage (1k users) of Zeppelin for
now. It is interesting to know your usage scenario.

Carlos Diogo  于2021年4月1日周四 上午1:56写道:

> Hi
> My two cents . The only way I know to scale this would be with a container
> based deployment like open shift . You would have isolation per user which
> makes each user process run in its own pod .
> In addition you could setup multiple zeppelin servers ( on the above
> mention way) and then have a load balancer in the middle ( Nginx for
> instance )
> You can have a common notebook repository such as S3 or an NFS share
> Finally , if you want to enable scheduling you would need to ensure that
> only one of the servers have the scheduling enabled so that the jobs only
> run in one instance .
> Hope it helps
> Carlos
>
> On Tue 30. Mar 2021 at 11:57, Great Info  wrote:
>
>> I see zeppelin does not have cluster deployment support.
>> Is there any work to support zeppelin access to multiple users.
>> Kindly share some links/guides if there is already discussion/solution
>> around this.
>>
>> A simple use case, many users(Atleast 1k users)  want to access sql
>> interpreter(jdbc postgres) write show read queries and select some
>> aggregation on the result then chart it.
>>
>> Thanks,
>> gub
>>
> --
> Os meus cumprimentos / Best regards /  Mit freundlichen Grüße
> Carlos Diogo
>


-- 
Best Regards

Jeff Zhang


Re: zeppelin's py4j conflicts with spark's

2021-04-01 Thread Jeff Zhang
Could you share how do you include pyspark's py4j to python interpreter ?

Rui Lu  于2021年3月25日周四 下午10:49写道:

> Hi all,
>
> I’m trying to switch from pyspark interpreter to python interpreter and
> ran into weird errors of py4j like “key error ‘x’” or “invalid command” or
> so when creating spark session.
>
> A little digging reveals that zeppelin has its own py4j stuffed into
> PYTHONPATH of python interpreter, the value of PYTHONPATH I get from within
> python interpreter notebook is
>
>
> '/usr/lib/zeppelin/interpreter/python/py4j-0.9.2/src:/usr/lib/spark/python:/usr/lib/spark/python/lib/py4j-0.10.7-src.zip'
>
> The latter parts with /usr/lib/spark are added into PYTHONPATH by me in
> zeppelin-env.sh. However no matter how I arrange the order of paths in
> PYTHONPATH, somehow zeppelin managed to set its py4j in the first place,
> preventing python interpreter from finding the right py4j that comes with
> spark. I suppose zeppelin manipulated PYTHONPATH after loading
> zeppelin-env.sh.
>
> The reason I prefer python interpreter is that I would like to have fine
> control over spark params per notebook (whereas pyspark uses the same set
> of spark params for all notebooks)
>
> Has anyone run into the same issue or is there a workaround for this?
>
> Thanks!
> --
> Lu Rui
>
> *This email may contain or reference confidential information and is
> intended only for the individual to whom it is addressed.  Please refrain
> from distributing, disclosing or copying this email and the information
> contained within unless you are the intended recipient.  If you received
> this email in error, please notify us at le...@appannie.com
> ** immediately and remove it from your system.*



-- 
Best Regards

Jeff Zhang


Re: Slack Channel Invite

2021-04-01 Thread Jeff Zhang
Sorry for late response, I have sent you the invitation.

Danny Cranmer  于2021年3月23日周二 下午6:45写道:

> Hello,
>
> Can you please invite me (dannycran...@apache.org) to the Zeppelin slack
> channel?
>
> Thanks!
>


-- 
Best Regards

Jeff Zhang


Re: zeppelin's py4j conflicts with spark's

2021-04-01 Thread Rui Lu
Hi Jeff,

I added one line into zeppelin-env.sh

export
PYTHONPATH=/usr/lib/spark/python:/usr/lib/spark/python/lib/py4j-src.zip:${PYTHONPATH}
 # note spark's py4j is newer than zeppelin's

This modification is picked up however as far as I can tell some scripts
for python interpreter initialisation added its own py4j. I'm using
zeppelin 0.8.2 from AWS EMR.

Regards
-Lu Rui

On Fri, Apr 2, 2021 at 9:05 AM Jeff Zhang  wrote:

> Could you share how do you include pyspark's py4j to python interpreter ?
>
> Rui Lu  于2021年3月25日周四 下午10:49写道:
>
>> Hi all,
>>
>> I’m trying to switch from pyspark interpreter to python interpreter and
>> ran into weird errors of py4j like “key error ‘x’” or “invalid command” or
>> so when creating spark session.
>>
>> A little digging reveals that zeppelin has its own py4j stuffed into
>> PYTHONPATH of python interpreter, the value of PYTHONPATH I get from within
>> python interpreter notebook is
>>
>>
>> '/usr/lib/zeppelin/interpreter/python/py4j-0.9.2/src:/usr/lib/spark/python:/usr/lib/spark/python/lib/py4j-0.10.7-src.zip'
>>
>> The latter parts with /usr/lib/spark are added into PYTHONPATH by me in
>> zeppelin-env.sh. However no matter how I arrange the order of paths in
>> PYTHONPATH, somehow zeppelin managed to set its py4j in the first place,
>> preventing python interpreter from finding the right py4j that comes with
>> spark. I suppose zeppelin manipulated PYTHONPATH after loading
>> zeppelin-env.sh.
>>
>> The reason I prefer python interpreter is that I would like to have fine
>> control over spark params per notebook (whereas pyspark uses the same set
>> of spark params for all notebooks)
>>
>> Has anyone run into the same issue or is there a workaround for this?
>>
>> Thanks!
>> --
>> Lu Rui
>>
>> *This email may contain or reference confidential information and is
>> intended only for the individual to whom it is addressed.  Please refrain
>> from distributing, disclosing or copying this email and the information
>> contained within unless you are the intended recipient.  If you received
>> this email in error, please notify us at le...@appannie.com
>> ** immediately and remove it from your system.*
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

-- 
*This email may contain or reference confidential information and is 
intended only for the individual to whom it is addressed.  Please refrain 
from distributing, disclosing or copying this email and the information 
contained within unless you are the intended recipient.  If you received 
this email in error, please notify us at le...@appannie.com 
** immediately and remove it from your system.*