+1 for Stephan's suggestion. For example, SQL connectors have never been
part of the main distribution and nobody complained about this so far. I
think what is more important than a big dist bundle is a helpful
"Downloads" page where users can easily find available filesystems,
connectors, metric repoters. Not everyone checks Maven central for
available JAR files. I just saw that we added a "Optional components"
section recently [1], we just need to make it more prominent. This is
also done for the SQL connectors and formats [2].
[1] https://flink.apache.org/downloads.html
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/table/connect.html#dependencies
Regards,
Timo
Am 23.01.19 um 10:07 schrieb Ufuk Celebi:
I like the idea of a leaner binary distribution. At the same time I
agree with Jamie that the current binary is quite convenient and
connection speeds should not be that big of a deal. Since the binary
distribution is one of the first entry points for users, I'd like to
keep it as user-friendly as possible.
What do you think about building a lean distribution by default and a
"full" distribution that still bundles all the optional dependencies
for releases? (If you don't think that's feasible I'm still +1 to only
go with the "lean dist" approach.)
– Ufuk
On Wed, Jan 23, 2019 at 9:36 AM Stephan Ewen <se...@apache.org> wrote:
There are some points where a leaner approach could help.
There are many libraries and connectors that are currently being adding to
Flink, which makes the "include all" approach not completely feasible in
long run:
- Connectors: For a proper experience with the Shell/CLI (for example for
SQL) we need a lot of fat connector jars.
These come often for multiple versions, which alone accounts for 100s
of MBs of connector jars.
- The pre-bundled FileSystems are also on the verge of adding 100s of MBs
themselves.
- The metric reporters are bit by bit growing as well.
The following could be a compromise:
The flink-dist would include
- the core flink libraries (core, apis, runtime, etc.)
- yarn / mesos etc. adapters
- examples (the examples should be a small set of self-contained programs
without additional dependencies)
- default logging
- default metric reporter (jmx)
- shells (scala, sql)
The flink-dist would NOT include the following libs (and these would be
offered for individual download)
- Hadoop libs
- the pre-shaded file systems
- the pre-packaged SQL connectors
- additional metric reporters
On Tue, Jan 22, 2019 at 3:19 AM Jeff Zhang <zjf...@gmail.com> wrote:
Thanks Chesnay for raising this discussion thread. I think there are 3
major use scenarios for flink binary distribution.
1. Use it to set up standalone cluster
2. Use it to experience features of flink, such as via scala-shell,
sql-client
3. Downstream project use it to integrate with their system
I did a size estimation of flink dist folder, lib folder take around 100M
and opt folder take around 200M. Overall I agree to make a thin flink dist.
So the next problem is which components to drop. I check the opt folder,
and I think the filesystem components and metrics components could be moved
out. Because they are pluggable components and is only used in scenario 1 I
think (setting up standalone cluster). Other components like flink-table,
flink-ml, flnk-gellay, we should still keep them IMHO, because new user may
still use it to try the features of flink. For me, scala-shell is the first
option to try new features of flink.
Fabian Hueske <fhue...@gmail.com> 于2019年1月18日周五 下午7:34写道:
Hi Chesnay,
Thank you for the proposal.
I think this is a good idea.
We follow a similar approach already for Hadoop dependencies and
connectors (although in application space).
+1
Fabian
Am Fr., 18. Jan. 2019 um 10:59 Uhr schrieb Chesnay Schepler <
ches...@apache.org>:
Hello,
the binary distribution that we release by now contains quite a lot of
optional components, including various filesystems, metric reporters and
libraries. Most users will only use a fraction of these, and as such
pretty much only increase the size of flink-dist.
With Flink growing more and more in scope I don't believe it to be
feasible to ship everything we have with every distribution, and instead
suggest more of a "pick-what-you-need" model, where flink-dist is rather
lean and additional components are downloaded separately and added by
the user.
This would primarily affect the /opt directory, but could also be
extended to cover flink-dist. For example, the yarn and mesos code could
be spliced out into separate jars that could be added to lib manually.
Let me know what you think.
Regards,
Chesnay
--
Best Regards
Jeff Zhang