I like the idea of a leaner binary distribution. At the same time I agree with Jamie that the current binary is quite convenient and connection speeds should not be that big of a deal. Since the binary distribution is one of the first entry points for users, I'd like to keep it as user-friendly as possible.
What do you think about building a lean distribution by default and a "full" distribution that still bundles all the optional dependencies for releases? (If you don't think that's feasible I'm still +1 to only go with the "lean dist" approach.) – Ufuk On Wed, Jan 23, 2019 at 9:36 AM Stephan Ewen <se...@apache.org> wrote: > > There are some points where a leaner approach could help. > There are many libraries and connectors that are currently being adding to > Flink, which makes the "include all" approach not completely feasible in > long run: > > - Connectors: For a proper experience with the Shell/CLI (for example for > SQL) we need a lot of fat connector jars. > These come often for multiple versions, which alone accounts for 100s > of MBs of connector jars. > - The pre-bundled FileSystems are also on the verge of adding 100s of MBs > themselves. > - The metric reporters are bit by bit growing as well. > > The following could be a compromise: > > The flink-dist would include > - the core flink libraries (core, apis, runtime, etc.) > - yarn / mesos etc. adapters > - examples (the examples should be a small set of self-contained programs > without additional dependencies) > - default logging > - default metric reporter (jmx) > - shells (scala, sql) > > The flink-dist would NOT include the following libs (and these would be > offered for individual download) > - Hadoop libs > - the pre-shaded file systems > - the pre-packaged SQL connectors > - additional metric reporters > > > On Tue, Jan 22, 2019 at 3:19 AM Jeff Zhang <zjf...@gmail.com> wrote: > > > Thanks Chesnay for raising this discussion thread. I think there are 3 > > major use scenarios for flink binary distribution. > > > > 1. Use it to set up standalone cluster > > 2. Use it to experience features of flink, such as via scala-shell, > > sql-client > > 3. Downstream project use it to integrate with their system > > > > I did a size estimation of flink dist folder, lib folder take around 100M > > and opt folder take around 200M. Overall I agree to make a thin flink dist. > > So the next problem is which components to drop. I check the opt folder, > > and I think the filesystem components and metrics components could be moved > > out. Because they are pluggable components and is only used in scenario 1 I > > think (setting up standalone cluster). Other components like flink-table, > > flink-ml, flnk-gellay, we should still keep them IMHO, because new user may > > still use it to try the features of flink. For me, scala-shell is the first > > option to try new features of flink. > > > > > > > > Fabian Hueske <fhue...@gmail.com> 于2019年1月18日周五 下午7:34写道: > > > >> Hi Chesnay, > >> > >> Thank you for the proposal. > >> I think this is a good idea. > >> We follow a similar approach already for Hadoop dependencies and > >> connectors (although in application space). > >> > >> +1 > >> > >> Fabian > >> > >> Am Fr., 18. Jan. 2019 um 10:59 Uhr schrieb Chesnay Schepler < > >> ches...@apache.org>: > >> > >>> Hello, > >>> > >>> the binary distribution that we release by now contains quite a lot of > >>> optional components, including various filesystems, metric reporters and > >>> libraries. Most users will only use a fraction of these, and as such > >>> pretty much only increase the size of flink-dist. > >>> > >>> With Flink growing more and more in scope I don't believe it to be > >>> feasible to ship everything we have with every distribution, and instead > >>> suggest more of a "pick-what-you-need" model, where flink-dist is rather > >>> lean and additional components are downloaded separately and added by > >>> the user. > >>> > >>> This would primarily affect the /opt directory, but could also be > >>> extended to cover flink-dist. For example, the yarn and mesos code could > >>> be spliced out into separate jars that could be added to lib manually. > >>> > >>> Let me know what you think. > >>> > >>> Regards, > >>> > >>> Chesnay > >>> > >>> > > > > -- > > Best Regards > > > > Jeff Zhang > >