Hi all, I am facing a strange issue on two different machines that acts like servers. Each of them runs an instance of Zeppelin installed as a system.d service. The configuration is: - Ubuntu Server 16.04.2 LTS - Spark 2.1.0 - Microsoft Open R 3.3.2 - Zeppelin 0.7.1 (0.7.0 gave the same problems)
zeppelin-env.sh has the following settings: export SPARK_HOME="/spark/home/directory" spark-env.sh has the following settings: export LANG="en_US" export SPARK_DAEMON_JAVA_OPTS+=" -Dspark.local.dir=/some/dir -Dspark.eventLog.dir=/some/dir/spark-events -Dhadoop.tmp.dir=/some/dir" export _JAVA_OPTIONS+=" -Djava.io.tmpdir=/some/dir" spark-defaults.conf is set as: spark.executor.memory 21g spark.driver.memory 21g spark.python.worker.memory 4g spark.sql.autoBroadcastJoinThreshold 0 I use Spark in stand-alone mode and it works perfectly. It also works correctly with Zeppelin but this is what happens: 1) Start zeppelin on the server using the command service zeppelin start 2) Connect to port 8080 using Mozilla Firefox from client 3) Insert username and password (I enabled Shiro authentication) 4) open a notebook 5) Execute the following code: %spark.r 2+2 6) The code runs correctly and I can see that R is currently running as a process. 7) Repeat steps 2-5 after some time (let’s say 2 or 3 hours) and Zeppelin remains forever on “Running” or, if the elapsed time is higher (for example 1 day) since the last run, it returns “Error”. The “time-to-be-unresponsive” seems to be random and unpredictable. Also, R is not present in the list of running processes. Spark session remains active because I can access Spark UI from port 4040 and the application name is “Zeppelin”, so it’s the Spark instance created by Zeppelin. I observed that sometimes I can simply restart the interpreter from Zeppelin UI, but many other times it doesn’t work and I have to restart Zeppelin ( service zeppelin restart ). This issue afflicts both 0.7.0 and 0.7.1 but I haven’t tried with previous versions. It also happens if Zeppelin isn’t installed as a service. I can’t provide more detail because I can’t see any error or warning in the logs.. this is really strange. Thank you all. Kind regards Pietro Pugni