Re: [DISCUSS] Release package size

moon soo Lee Fri, 20 Jan 2017 00:58:16 -0800

Hi,

I think we need to have some policy to decide which interpreter goes into
zeppelin-bin-min package. And make applying that policy as a part of
release process.
Because i can not see any consistent rule except for "it seems" or "i
guess". And i have no idea how i can explain if somebody ask 'why python is
not in min package?' 'why xxx is not in min package?'.


If we really want to min package, we must have a policy that gives everyone
same expectation which goes to min package and which goes not. Once we
agree on policy we can make it part of the release process.

So, why don't we try define policy together? Here's some idea i can throw.

 a. Min package includes interpreters, binary size less than 10MB
 b. Min package includes interpreters 5 or more JIRA issue created per
month.
 c. Min package includes/exclude interpreter that community decide via
formal vote.

"10MB", "5 or more" they are number i just made up. We can change them to
more reasonable numbers.
Also a,b,c are possible examples. We can refine them, we can use only one,
we can use all three, we can add more.

My point is, we need to give everyone the same expectation which goes min
package, which goes not.
What do you think?

Thanks,
moon

On Thu, Jan 19, 2017 at 12:47 AM Mina Lee <mina...@apache.org> wrote:

> Thank you for sharing your opinion guys.
>
> I like Eric's approach.
> We are planning to provide official docker managed by community.
> There is ongoing work [1] around it, I can focus on this after 0.7.0
> release.
>
> It seems that majority prefers binary package with top used interpreters
> such as spark, md, jdbc.
> I think we can gradually move to providing only netinst package once
> docker is ready.
> For upcoming 0.7.0 release, I'd like to distribute two binary packages:
>   - zeppelin-bin-min(spark, jdbc, md)
>   - zeppelin-bin-netinst(spark only)
>
> [1] https://github.com/apache/zeppelin/pull/1761
>
> Thanks,
> Mina
>
> On Thu, Jan 19, 2017 at 1:57 AM Jongyoul Lee <jongy...@gmail.com> wrote:
>
> I like to deploy netinst only. And it's good idea that Apache Zeppelin
> supports official docker image with all possible interpreters.
>
> On Wed, Jan 18, 2017 at 7:42 PM, Eric Pugh <
> ep...@opensourceconnections.com> wrote:
>
> Can I throw out an alternate approach?   I feel like the key value of the
> “-all” option is to simplify the life of someone who is new to Zeppelin.
>  If you’re a sophisticated Zeppelin user, then picking and choosing
> interpreters is easy, and you you grok why you want to do that….
>
> However, for myself, when I want to demo Zeppelin, I go straight to one of
> the Docker images, specifically
> https://github.com/dylanmei/docker-zeppelin because it bundles in
> everything.
>
> Would providing a similar Docker image on the “Get Zeppelin” page that
> bundles in all the dependencies and interpreters solve the “how do I try
> Zeppelin in 5 minutes” challenge?  The “Get Zeppelin” page is rather
> daunting page!
>
> Eric
>
>
> On Jan 18, 2017, at 12:00 AM, Mohit Jaggi <mohitja...@gmail.com> wrote:
>
>  Including ALL interpreters is not feasible, not due to download size as
> that is easily increased but because we wouldn't want to couple the release
> cycles as pointed out by Jeff. IMHO a few of the most popular ones should
> be included. Yes it is just one extra step but if a computer can do it why
> make a human suffer? :-)
> Re: spark-packages, Spark does include important and mature functionality
> in its assembly e.g. Csv parser was merged into core spark when it matured.
> I believe Z should do the same.
>
> Sent from my iPhone
>
> On Jan 17, 2017, at 8:05 PM, Jeff Zhang <zjf...@gmail.com> wrote:
>
>
> Another thing I'd like to talk is that should we move most of interpreters
> out of zeppelin project to somewhere else just like spark do for
> spark-packages, 2 benefits:
>
> 1. Keep the zeppelin project much smaller
> 2. Each interpreter's improvements won't be blocked by the release of
> zeppelin. Interpreters can has its own release cycle as long as
> zeppelin-interpreter doesn't break the compatibility.
>
> If it make sense, I can open another thread to discuss it.
>
>
>
>
> Jun Kim <i2r....@gmail.com>于2017年1月18日周三 上午11:55写道：
>
> +1 for Jeff's idea! I also use the three interpreters mainly :)
>
> 2017년 1월 18일 (수) 오후 12:52, Jeff Zhang <zjf...@gmail.com>님이 작성:
>
>
> How about also include markdown and jdbc interpreter if this won't cause
> binary distribution much bigger ? I guess spark, markdown, and jdbc
> interpreters are the top 3 interpreters in zeppelin.
>
>
>
> Ahyoung Ryu <ahyoung...@apache.org>于2017年1月18日周三 上午11:33写道：
>
> Thanks Mina always!
> +1 for releasing only netinst package.
>
> On Wed, Jan 18, 2017 at 12:29 PM, Prabhjyot Singh <
> prabhjyotsi...@apache.org> wrote:
>
> +1
>
> I don't think it's a problem now, but if it keeps increasing then in the
> subsequent releases we can ship Zeppelin with few interpreters, and mark
> others as plugins that can be downloaded later with instructions with how
> to configure.
>
> On Jan 18, 2017 8:54 AM, "Jun Kim" <i2r....@gmail.com> wrote:
>
> +1
>
> I think it won't be a problem if we notice it clear.
> Maybe we can do that next to the download button here (
> http://zeppelin.apache.org/download.html)
> A message may be "NOTE: only spark interpreter included since 0.7.0. If
> you want other interpreters, please see interpreter installation guide"
>
> 2017년 1월 18일 (수) 오후 12:14, Jeff Zhang <zjf...@gmail.com>님이 작성:
>
>
> +1, we should also mention it in release note and in the 0.7 doc
>
>
>
> Mina Lee <mina...@apache.org>于2017年1月18日周三 上午11:12写道：
>
> Hi all,
>
> Zeppelin is about to start 0.7.0 release process, I would like to discuss
> about binary package distribution.
>
> Every time we distribute new binary package, size of the
> zeppelin-0.x.x-bin-all.tgz package is getting bigger:
>    - zeppelin-0.6.0-bin-all.tgz: 506M
>    - zeppelin-0.6.1-bin-all.tgz: 517M
>    - zeppelin-0.6.2-bin-all.tgz: 547M
>    - zeppelin-0.7.0-bin-all.tgz: 720M (Expected)
>
> Mostly it is because the number of interpreters supported by zeppelin
> keeps growing,
> and there is high chance that we support more interpreters in the near
> future.
> So instead of asking apache infra team to increase limit,
> I would like to suggest to have only zeppelin-0.7.0-bin-netinst.tgz, which
> only includes spark interpreter from 0.7.0 release.
> One concern is that users need one more step to install the interpreters
> they use,
> but I believe it can be done easily with single line of command [1].
>
> FYI, attaching the link of similar discussion [2] we had last June in
> mailing list.
>
> Regards,
> Mina
>
> [1]
> http://zeppelin.apache.org/docs/0.6.2/manual/interpreterinstallation.html#install-specific-interpreters
> <http://zeppelin.apache.org/docs/0.6.2/manual/interpreterinstallation.html>
> [2]
> https://lists.apache.org/thread.html/4b54c034cf8d691655156e0cb647243180c57a6829d97aa3c085b63c@%3Cusers.zeppelin.apache.org%3E
>
> --
> Taejun Kim
>
> Data Mining Lab.
> School of Electrical and Computer Engineering
> University of Seoul
>
>
> --
> Taejun Kim
>
> Data Mining Lab.
> School of Electrical and Computer Engineering
> University of Seoul
>
>
>
> _______________________
> *Eric Pugh **| *Founder & CEO | OpenSource Connections, LLC | 434.466.1467
> | http://www.opensourceconnections.com | My Free/Busy
> <http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed
> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.
>
>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>
>

Re: [DISCUSS] Release package size

Reply via email to