Github user AhyoungRyu commented on the issue: https://github.com/apache/zeppelin/pull/1339 @bzz Thank you for such precise comment! Let me break down your feedback one by one(just for making it clear) :) 1. >/.spark-dist/ is under cache on TravisCI which is S3 bucket that gets synced automatically with the content of this folder while running a build. Right. That's my bad. I'll change the dir to another. Then how about `ZEPPELIN_HOME/interpreter/spark/` as like before? 2, 3, 4. >what is the benefit and what problem does this change solves? Actually I also tried to describe well about the current problem & the advantage of this change in Jira issue and the PR description, but i guess i didn't. I should've explain more clearly. Let me explain more in here with actual digit. (I'll update the Jira & PR description as well) - **What was the problem?** As you said in the above, yes. The main problem is the Zeppelin binary package size. The latest version of Zeppelin bin size was ``` zeppelin-0.6.1-bin-all.tgz: 517MB zeppelin-0.6.1-bin-netinst.tgz: 236MB ``` Didn't we ask ASF infra team(?) every release because of Zeppelin's huge package size? - **What is the benefit?** When I created binary package without `spark-dependencies`, the each bin package size was ``` zeppelin-0.6.1-bin-all.tgz: 344MB zeppelin-0.6.1-bin-netinst.tgz: 64MB ``` As you can see in the above those two cases' size diff is about `170MB`! Moreover, users don't need to type build profiles i.e. `-Pr` or `-Psparkr`. I saw many users who are trying to use `%sparkr` in Zeppelin, they hit NPE because they didn't build with `-Psparkr`. It's truly confuse maybe they don't know well about the maven build mechanism. But with this change, they don't need to know about the complicating maven build profiles. 5. > Also regarding user experience - while running zeppelin-demon.sh user does not usually expect it to be network-dependant and download 100Mb archives - is there at least a user notification\progress indicator So far, I just added below line to show in console after users start `zeppelin-daemon.sh` ``` echo "There is no SPARK_HOME in your system. After successful Spark bin installation, Zeppelin will be started." ``` Then it starts downloading Spark binary from the mirror site. I'm planning to add some description to README as we have provided many build profiles information in there. I also agree there must be better way to notify that instead of just writing about "We will download 100MB Spark binary package if you don't set SPARK_HOME yet" on README. After first I came up with removing `spark-dependencies` to reduce Zeppelin bin package size, I spent long time to think about how can we substitute the preexisting way seamlessly to provide embedded Spark in Zeppelin as like before. Please regard this PR as the first initiative. And will be appreciated if you can share your awesome idea about this issue! :)
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---