There are some points where a leaner approach could help. There are many libraries and connectors that are currently being adding to Flink, which makes the "include all" approach not completely feasible in long run:
- Connectors: For a proper experience with the Shell/CLI (for example for SQL) we need a lot of fat connector jars. These come often for multiple versions, which alone accounts for 100s of MBs of connector jars. - The pre-bundled FileSystems are also on the verge of adding 100s of MBs themselves. - The metric reporters are bit by bit growing as well. The following could be a compromise: The flink-dist would include - the core flink libraries (core, apis, runtime, etc.) - yarn / mesos etc. adapters - examples (the examples should be a small set of self-contained programs without additional dependencies) - default logging - default metric reporter (jmx) - shells (scala, sql) The flink-dist would NOT include the following libs (and these would be offered for individual download) - Hadoop libs - the pre-shaded file systems - the pre-packaged SQL connectors - additional metric reporters On Tue, Jan 22, 2019 at 3:19 AM Jeff Zhang <zjf...@gmail.com> wrote: > Thanks Chesnay for raising this discussion thread. I think there are 3 > major use scenarios for flink binary distribution. > > 1. Use it to set up standalone cluster > 2. Use it to experience features of flink, such as via scala-shell, > sql-client > 3. Downstream project use it to integrate with their system > > I did a size estimation of flink dist folder, lib folder take around 100M > and opt folder take around 200M. Overall I agree to make a thin flink dist. > So the next problem is which components to drop. I check the opt folder, > and I think the filesystem components and metrics components could be moved > out. Because they are pluggable components and is only used in scenario 1 I > think (setting up standalone cluster). Other components like flink-table, > flink-ml, flnk-gellay, we should still keep them IMHO, because new user may > still use it to try the features of flink. For me, scala-shell is the first > option to try new features of flink. > > > > Fabian Hueske <fhue...@gmail.com> 于2019年1月18日周五 下午7:34写道: > >> Hi Chesnay, >> >> Thank you for the proposal. >> I think this is a good idea. >> We follow a similar approach already for Hadoop dependencies and >> connectors (although in application space). >> >> +1 >> >> Fabian >> >> Am Fr., 18. Jan. 2019 um 10:59 Uhr schrieb Chesnay Schepler < >> ches...@apache.org>: >> >>> Hello, >>> >>> the binary distribution that we release by now contains quite a lot of >>> optional components, including various filesystems, metric reporters and >>> libraries. Most users will only use a fraction of these, and as such >>> pretty much only increase the size of flink-dist. >>> >>> With Flink growing more and more in scope I don't believe it to be >>> feasible to ship everything we have with every distribution, and instead >>> suggest more of a "pick-what-you-need" model, where flink-dist is rather >>> lean and additional components are downloaded separately and added by >>> the user. >>> >>> This would primarily affect the /opt directory, but could also be >>> extended to cover flink-dist. For example, the yarn and mesos code could >>> be spliced out into separate jars that could be added to lib manually. >>> >>> Let me know what you think. >>> >>> Regards, >>> >>> Chesnay >>> >>> > > -- > Best Regards > > Jeff Zhang >