GitHub user AhyoungRyu reopened a pull request: https://github.com/apache/zeppelin/pull/1339
[ZEPPELIN-1332] Remove spark-dependencies & suggest new way ### What is this PR for? Currently, Zeppelin's embedded Spark is located under `interpreter/spark/`. For whom **builds Zeppelin from source**, this Spark is downloaded when they build the source with [build profiles](https://github.com/apache/zeppelin#spark-interpreter). I think this various build profiles are useful to customize the embedded Spark, but many Spark users are using their own Spark not Zeppelin's embedded one. Nowadays only Spark&Zeppelin beginners use this embedded Spark. For them, there are too many build profiles(it's so complicated i think). In case of **Zeppelin binary package**, it's included by default under `interpreter/spark/`. That's why Zeppelin package size is so huge. #### New suggestions This PR will change the embedded Spark binary downloading mechanism like below. 1. `./bin/zeppelin-daemon.sh get-spark` or `./bin/zeppelin.sh get-spark` 2. create `ZEPPELIN_HOME/local-spark/` and will download `spark-2.0.1-hadoop2.7.bin.tgz` and untar 3. we can use this local spark without any configuration like before (e.g. setting `SPARK_HOME`) ### What type of PR is it? Improvement ### Todos - [x] - trap `ctrl+c` & `ctrl+z` key interruption during downloading Spark - [x] - test in the different OS - [x] - update related document pages again after get feedbacks ### What is the Jira issue? [ZEPPELIN-1332](https://issues.apache.org/jira/browse/ZEPPELIN-1332) ### How should this be tested? 1. `rm -r spark-dependencies` 2. Apply this patch and build with `mvn clean package -DskipTests` 3. try`bin/zeppelin-daemon.sh get-spark` or `bin/zeppelin.sh get-spark` 4. should be able to run `sc.version` without setting external `SPARK_HOME` ### Screenshots (if appropriate) - `./bin/zeppelin-daemon.sh get-spark` ``` $ ./bin/zeppelin-daemon.sh get-spark Download spark-2.0.1-bin-hadoop2.7.tgz from mirror ... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 178M 100 178M 0 0 10.4M 0 0:00:17 0:00:17 --:--:-- 10.2M spark-2.0.1-bin-hadoop2.7 is successfully downloaded and saved under /Users/ahyoungryu/Dev/zeppelin-development/zeppelin/local-spark ``` - if `ZEPPELIN_HOME/local-spark/spark-2.0.1-hadoop2.7` already exists ``` $ ./bin/zeppelin-daemon.sh get-spark spark-2.0.1-bin-hadoop2.7 already exists under local-spark. ``` ### Questions: - Does the licenses files need update? no - Is there breaking changes for older versions? no - Does this needs documentation? Need to update some related documents (e.g. README.md, spark.md and install.md ?) You can merge this pull request into a Git repository by running: $ git pull https://github.com/AhyoungRyu/zeppelin ZEPPELIN-1332 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zeppelin/pull/1339.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1339 ---- commit d377cc6f28dd6cae43364f61135ed8abcba3b269 Author: AhyoungRyu <fbdkdu...@hanmail.net> Date: 2016-08-16T15:08:19Z Fix typo comment in interpreter.sh commit 4f3edfd87e84e65789e0e937b5330c16442fcfbe Author: AhyoungRyu <fbdkdu...@hanmail.net> Date: 2016-08-17T01:52:06Z Remove spark-dependencies commit 99ef019521ca1fd0fc41958b20da8642773825d5 Author: AhyoungRyu <fbdkdu...@hanmail.net> Date: 2016-08-17T07:14:35Z Add spark-2.*-bin-hadoop* to .gitignore commit 4e8d5ff067c5428a5254e45b4de533c56393f7b4 Author: AhyoungRyu <fbdkdu...@hanmail.net> Date: 2016-08-17T15:22:25Z Add download-spark.sh file commit 6784015b8da439894dd09bbc3e54477a0f3cba84 Author: AhyoungRyu <fbdkdu...@hanmail.net> Date: 2016-08-17T15:28:51Z Remove useless comment line in common.sh commit c866f0b231432b14c092a365d270e81a2222f54a Author: AhyoungRyu <fbdkdu...@hanmail.net> Date: 2016-08-18T03:32:11Z Remove zeppelin-spark-dependencies from r/pom.xml commit 3fe19bff1bdbdccba63e3163bd7aabfe23a35777 Author: AhyoungRyu <fbdkdu...@hanmail.net> Date: 2016-08-21T05:38:55Z Change SPARK_HOME with proper message commit 99545233c0e84f48fbf98da25ad131eeba6dd293 Author: AhyoungRyu <fbdkdu...@hanmail.net> Date: 2016-09-06T08:55:20Z Check interpreter/spark/ instead of SPARK_HOME commit e6973b3887e9c0d50a1168f26e6f0337f9f78986 Author: AhyoungRyu <fbdkdu...@hanmail.net> Date: 2016-09-06T08:55:40Z Refactor download-spark.sh commit 552185ac03f1b5edc9fabb4d381d471c59078903 Author: AhyoungRyu <fbdkdu...@hanmail.net> Date: 2016-09-07T07:48:15Z Revert: remove spark-dependencies commit ffe64d9b264ab3db67d28a045e34c9c4d471058a Author: AhyoungRyu <fbdkdu...@hanmail.net> Date: 2016-09-07T13:23:11Z Remove useless ZEPPELIN_HOME commit 5ed33112d64dc3063a29d515d4987e193a909dd0 Author: AhyoungRyu <fbdkdu...@hanmail.net> Date: 2016-09-08T05:51:40Z Change dir of Spark bin to 'local-spark' commit 1419f0b8d76a8e15ac7646e3827dd536246038d1 Author: AhyoungRyu <fbdkdu...@hanmail.net> Date: 2016-09-08T06:07:20Z Set timeout for travis test commit a813d922ba29b5c392a908c3199050884266b969 Author: AhyoungRyu <fbdkdu...@hanmail.net> Date: 2016-09-08T06:16:54Z Add license header to download-spark.cmd commit 368c15aefd650a59c6fb0fdd040efe1bbb2618cc Author: AhyoungRyu <fbdkdu...@hanmail.net> Date: 2016-09-08T11:48:43Z Fix wrong check condition in common.sh commit e58075d046f65ae173fecc31c0b648b87f445af4 Author: AhyoungRyu <fbdkdu...@hanmail.net> Date: 2016-09-08T13:14:29Z Add travis condition to download-spark.sh commit 89be91b049a646b1a0fc7dcfeb5e8bfde68bdab4 Author: AhyoungRyu <fbdkdu...@hanmail.net> Date: 2016-09-12T05:42:29Z Remove bin/download-spark.cmd again commit b22364ddba120842933e96eca1e082680cd5407a Author: AhyoungRyu <fbdkdu...@hanmail.net> Date: 2016-09-12T16:25:31Z Remove spark-dependency profiles & reorganize some titles in README.md commit 24dc95faa39586be323365f21a2beb1f683becf8 Author: AhyoungRyu <fbdkdu...@hanmail.net> Date: 2016-09-12T18:30:41Z Update spark.md to add a guide for local-spark mode commit 2537fa14d5e13c34be9eeab932bf5dc853bda5d4 Author: AhyoungRyu <fbdkdu...@hanmail.net> Date: 2016-09-12T18:49:49Z Remove '-Ppyspark' build options commit ca534e596c36ced04f832b0a7ab7e78e951929e1 Author: AhyoungRyu <fbdkdu...@hanmail.net> Date: 2016-09-13T08:09:18Z Remove useless creating .bak file process commit edd525d0f6eac0a956bc64f58e77ac3afc423f58 Author: AhyoungRyu <fbdkdu...@hanmail.net> Date: 2016-09-13T11:21:10Z Update install.md & spark.md commit a9b110a809463ac1795e76a30b9cd2df6c40292d Author: AhyoungRyu <fbdkdu...@hanmail.net> Date: 2016-09-14T09:35:37Z Resolve 'sed' command issue between OSX & Linux commit f383d3afb8f9e2c1e240f69d8d970c469d0a9ced Author: AhyoungRyu <fbdkdu...@hanmail.net> Date: 2016-09-14T11:20:31Z Trap ctrl+c during downloading Spark commit 527ef5b6518d3477d9731422cad190a59df11d1e Author: AhyoungRyu <fbdkdu...@hanmail.net> Date: 2016-09-14T11:26:56Z Remove useless condition commit 555372a655b788b3b0fdd85d430b6f063ce13834 Author: AhyoungRyu <ahyoung...@apache.org> Date: 2016-09-20T17:05:16Z Make local spark mode with zero-configuration as @moon suggested commit de87cb2adf5ad510a712e4f696ae127c7a414077 Author: AhyoungRyu <ahyoung...@apache.org> Date: 2016-09-22T14:20:31Z Modify SparkRInterpreter.java to enable SparkR without SPARK_HOME commit 1dd51d8e1dcb8d65e22a1cc67a5d089c5d7c196b Author: AhyoungRyu <ahyoung...@apache.org> Date: 2016-09-22T17:01:40Z Remove duplicated variable declaration commit f068bef554507e7125865f77816986d5b085a7b3 Author: AhyoungRyu <ahyoung...@apache.org> Date: 2016-09-22T17:02:01Z Update related docs again commit 437f2063a39d2a7a583bb647cb885e51a0990098 Author: AhyoungRyu <ahyoung...@apache.org> Date: 2016-09-23T05:37:57Z Fix typo in SparkRInterpreter.java ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---