Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

Kurt Young Fri, 24 Apr 2020 06:16:40 -0700

+1 for "slim" and "fat" solution. One comment about the fat one, I think we
need to
put all needed jars into /lib (or /plugins). Put jars into /opt and relying
on users moving
them from /opt to /lib doesn't really improve the out-of-box experience.


Best,
Kurt


On Fri, Apr 24, 2020 at 8:28 PM Aljoscha Krettek <[email protected]>
wrote:

> re (1): I don't know about that, probably the people that did the
> metrics reporter plugin support had some thoughts about that.
>
> re (2): I agree, that's why I initially suggested to split it into
> "slim" and "fat" because our current "medium fat" selection of jars in
> Flink dist does not serve anyone too well. It's too fat for people that
> want to build lean application images. It's to lean for people that want
> a good first out-of-box experience.
>
> Aljoscha
>
> On 17.04.20 16:38, Stephan Ewen wrote:
> > @Aljoscha I think that is an interesting line of thinking. the swift-fs
> may
> > be rarely enough used to move it to an optional download.
> >
> > I would still drop two more thoughts:
> >
> > (1) Now that we have plugins support, is there a reason to have a metrics
> > reporter or file system in /opt instead of /plugins? They don't spoil the
> > class path any more.
> >
> > (2) I can imagine there still being a desire to have a "minimal" docker
> > file, for users that want to keep the container images as small as
> > possible, to speed up deployment. It is fine if that would not be the
> > default, though.
> >
> >
> > On Fri, Apr 17, 2020 at 12:16 PM Aljoscha Krettek <[email protected]>
> > wrote:
> >
> >> I think having such tools and/or tailor-made distributions can be nice
> >> but I also think the discussion is missing the main point: The initial
> >> observation/motivation is that apparently a lot of users (Kurt and I
> >> talked about this) on the chinese DingTalk support groups, and other
> >> support channels have problems when first using the SQL client because
> >> of these missing connectors/formats. For these, having additional tools
> >> would not solve anything because they would also not take that extra
> >> step. I think that even tiny friction should be avoided because the
> >> annoyance from it accumulates of the (hopefully) many users that we want
> >> to have.
> >>
> >> Maybe we should take a step back from discussing the "fat"/"slim" idea
> >> and instead think about the composition of the current dist. As
> >> mentioned we have these jars in opt/:
> >>
> >>    17M flink-azure-fs-hadoop-1.10.0.jar
> >>    52K flink-cep-scala_2.11-1.10.0.jar
> >> 180K flink-cep_2.11-1.10.0.jar
> >> 746K flink-gelly-scala_2.11-1.10.0.jar
> >> 626K flink-gelly_2.11-1.10.0.jar
> >> 512K flink-metrics-datadog-1.10.0.jar
> >> 159K flink-metrics-graphite-1.10.0.jar
> >> 1.0M flink-metrics-influxdb-1.10.0.jar
> >> 102K flink-metrics-prometheus-1.10.0.jar
> >>    10K flink-metrics-slf4j-1.10.0.jar
> >>    12K flink-metrics-statsd-1.10.0.jar
> >>    36M flink-oss-fs-hadoop-1.10.0.jar
> >>    28M flink-python_2.11-1.10.0.jar
> >>    22K flink-queryable-state-runtime_2.11-1.10.0.jar
> >>    18M flink-s3-fs-hadoop-1.10.0.jar
> >>    31M flink-s3-fs-presto-1.10.0.jar
> >> 196K flink-shaded-netty-tcnative-dynamic-2.0.25.Final-9.0.jar
> >> 518K flink-sql-client_2.11-1.10.0.jar
> >>    99K flink-state-processor-api_2.11-1.10.0.jar
> >>    25M flink-swift-fs-hadoop-1.10.0.jar
> >> 160M opt
> >>
> >> The "filesystem" connectors ar ethe heavy hitters, there.
> >>
> >> I downloaded most of the SQL connectors/formats and this is what I got:
> >>
> >>    73K flink-avro-1.10.0.jar
> >>    36K flink-csv-1.10.0.jar
> >>    55K flink-hbase_2.11-1.10.0.jar
> >>    88K flink-jdbc_2.11-1.10.0.jar
> >>    42K flink-json-1.10.0.jar
> >>    20M flink-sql-connector-elasticsearch6_2.11-1.10.0.jar
> >> 2.8M flink-sql-connector-kafka_2.11-1.10.0.jar
> >>    24M sql-connectors-formats
> >>
> >> We could just add these to the Flink distribution without blowing it up
> >> by much. We could drop any of the existing "filesystem" connectors from
> >> opt and add the SQL connectors/formats and not change the size of Flink
> >> dist. So maybe we should do that instead?
> >>
> >> We would need some tooling for the sql-client shell script to pick-up
> >> the connectors/formats up from opt/ because we don't want to add them to
> >> lib/. We're already doing that for finding the flink-sql-client jar,
> >> which is also not in lib/.
> >>
> >> What do you think?
> >>
> >> Best,
> >> Aljoscha
> >>
> >> On 17.04.20 05:22, Jark Wu wrote:
> >>> Hi,
> >>>
> >>> I like the idea of web tool to assemble fat distribution. And the
> >>> https://code.quarkus.io/ looks very nice.
> >>> All the users need to do is just select what he/she need (I think this
> >> step
> >>> can't be omitted anyway).
> >>> We can also provide a default fat distribution on the web which default
> >>> selects some popular connectors.
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>> On Fri, 17 Apr 2020 at 02:29, Rafi Aroch <[email protected]> wrote:
> >>>
> >>>> As a reference for a nice first-experience I had, take a look at
> >>>> https://code.quarkus.io/
> >>>> You reach this page after you click "Start Coding" at the project
> >> homepage.
> >>>>
> >>>> Rafi
> >>>>
> >>>>
> >>>> On Thu, Apr 16, 2020 at 6:53 PM Kurt Young <[email protected]> wrote:
> >>>>
> >>>>> I'm not saying pre-bundle some jars will make this problem go away,
> and
> >>>>> you're right that only hides the problem for
> >>>>> some users. But what if this solution can hide the problem for 90%
> >> users?
> >>>>> Would't that be good enough for us to try?
> >>>>>
> >>>>> Regarding to would users following instructions really be such a big
> >>>>> problem?
> >>>>> I'm afraid yes. Otherwise I won't answer such questions for at least
> a
> >>>>> dozen times and I won't see such questions coming
> >>>>> up from time to time. During some periods, I even saw such questions
> >>>> every
> >>>>> day.
> >>>>>
> >>>>> Best,
> >>>>> Kurt
> >>>>>
> >>>>>
> >>>>> On Thu, Apr 16, 2020 at 11:21 PM Chesnay Schepler <
> [email protected]>
> >>>>> wrote:
> >>>>>
> >>>>>> The problem with having a distribution with "popular" stuff is that
> it
> >>>>>> doesn't really *solve* a problem, it just hides it for users who
> fall
> >>>>>> into these particular use-cases.
> >>>>>> Move out of it and you once again run into exact same problems
> >>>> out-lined.
> >>>>>>
> >>>>>> This is exactly why I like the tooling approach; you have to deal
> with
> >>>> it
> >>>>>> from the start and transitioning to a custom use-case is easier.
> >>>>>>
> >>>>>> Would users following instructions really be such a big problem?
> >>>>>> I would expect that users generally know *what *they need, just not
> >>>>>> necessarily how it is assembled correctly (where do get which jar,
> >>>> which
> >>>>>> directory to put it in).
> >>>>>> It seems like these are exactly the problem this would solve?
> >>>>>> I just don't see how moving a jar corresponding to some feature from
> >>>> opt
> >>>>>> to some directory (lib/plugins) is less error-prone than just
> >> selecting
> >>>>> the
> >>>>>> feature and having the tool handle the rest.
> >>>>>>
> >>>>>> As for re-distributions, it depends on the form that the tool would
> >>>> take.
> >>>>>> It could be an application that runs locally and works against maven
> >>>>>> central (note: not necessarily *using* maven); this should would
> work
> >>>> in
> >>>>>> China, no?
> >>>>>>
> >>>>>> A web tool would of course be fancy, but I don't know how feasible
> >> this
> >>>>> is
> >>>>>> with the ASF infrastructure.
> >>>>>> You wouldn't be able to mirror the distribution, so the load can't
> be
> >>>>>> distributed. I doubt INFRA would like this.
> >>>>>>
> >>>>>> Note that third-parties could also start distributing use-case
> >> oriented
> >>>>>> distributions, which would be perfectly fine as far as I'm
> concerned.
> >>>>>>
> >>>>>> On 16/04/2020 16:57, Kurt Young wrote:
> >>>>>>
> >>>>>> I'm not so sure about the web tool solution though. The concern I
> have
> >>>>> for
> >>>>>> this approach is the final generated
> >>>>>> distribution is kind of non-deterministic. We might generate too
> many
> >>>>>> different combinations when user trying to
> >>>>>> package different types of connector, format, and even maybe hadoop
> >>>>>> releases.  As far as I can tell, most open
> >>>>>> source projects and apache projects will only release some
> >>>>>> pre-defined distributions, which most users are already
> >>>>>> familiar with, thus hard to change IMO. And I also have went through
> >> in
> >>>>>> some cases, users will try to re-distribute
> >>>>>> the release package, because of the unstable network of apache
> website
> >>>>> from
> >>>>>> China. In web tool solution, I don't
> >>>>>> think this kind of re-distribution would be possible anymore.
> >>>>>>
> >>>>>> In the meantime, I also have a concern that we will fall back into
> our
> >>>>> trap
> >>>>>> again if we try to offer this smart & flexible
> >>>>>> solution. Because it needs users to cooperate with such mechanism.
> >> It's
> >>>>>> exactly the situation what we currently fell
> >>>>>> into:
> >>>>>> 1. We offered a smart solution.
> >>>>>> 2. We hope users will follow the correct instructions.
> >>>>>> 3. Everything will work as expected if users followed the right
> >>>>>> instructions.
> >>>>>>
> >>>>>> In reality, I suspect not all users will do the second step
> correctly.
> >>>>> And
> >>>>>> for new users who only trying to have a quick
> >>>>>> experience with Flink, I would bet most users will do it wrong.
> >>>>>>
> >>>>>> So, my proposal would be one of the following 2 options:
> >>>>>> 1. Provide a slim distribution for advanced product users and
> provide
> >> a
> >>>>>> distribution which will have some popular builtin jars.
> >>>>>> 2. Only provide a distribution which will have some popular builtin
> >>>> jars.
> >>>>>>
> >>>>>> If we are trying to reduce the distributions we released, I would
> >>>> prefer
> >>>>> 2
> >>>>>>
> >>>>>> 1.
> >>>>>>
> >>>>>> Best,
> >>>>>> Kurt
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Apr 16, 2020 at 9:33 PM Till Rohrmann <[email protected]
> >
> >> <
> >>>>> [email protected]> wrote:
> >>>>>>
> >>>>>>
> >>>>>> I think what Chesnay and Dawid proposed would be the ideal solution.
> >>>>>> Ideally, we would also have a nice web tool for the website which
> >>>>> generates
> >>>>>> the corresponding distribution for download.
> >>>>>>
> >>>>>> To get things started we could start with only supporting to
> >>>>>> download/creating the "fat" version with the script. The fat version
> >>>>> would
> >>>>>> then consist of the slim distribution and whatever we deem important
> >>>> for
> >>>>>> new users to get started.
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Till
> >>>>>>
> >>>>>> On Thu, Apr 16, 2020 at 11:33 AM Dawid Wysakowicz <
> >>>>> [email protected]> <[email protected]>
> >>>>>> wrote:
> >>>>>>
> >>>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> Few points from my side:
> >>>>>>
> >>>>>> 1. I like the idea of simplifying the experience for first time
> users.
> >>>>>> As for production use cases I share Jark's opinion that in this
> case I
> >>>>>> would expect users to combine their distribution manually. I think
> in
> >>>>>> such scenarios it is important to understand interconnections.
> >>>>>> Personally I'd expect the slimmest possible distribution that I can
> >>>>>> extend further with what I need in my production scenario.
> >>>>>>
> >>>>>> 2. I think there is also the problem that the matrix of possible
> >>>>>> combinations that can be useful is already big. Do we want to have a
> >>>>>> distribution for:
> >>>>>>
> >>>>>>       SQL users: which connectors should we include? should we
> include
> >>>>>> hive? which other catalog?
> >>>>>>
> >>>>>>       DataStream users: which connectors should we include?
> >>>>>>
> >>>>>>      For both of the above should we include yarn/kubernetes?
> >>>>>>
> >>>>>> I would opt for providing only the "slim" distribution as a release
> >>>>>> artifact.
> >>>>>>
> >>>>>> 3. However, as I said I think its worth investigating how we can
> >>>> improve
> >>>>>> users experience. What do you think of providing a tool, could be
> e.g.
> >>>> a
> >>>>>> shell script that constructs a distribution based on users choice. I
> >>>>>> think that was also what Chesnay mentioned as "tooling to
> >>>>>> assemble custom distributions" In the end how I see the difference
> >>>>>> between a slim and fat distribution is which jars do we put into the
> >>>>>> lib, right? It could have a few "screens".
> >>>>>>
> >>>>>> 1. Which API are you interested in:
> >>>>>> a. SQL API
> >>>>>> b. DataStream API
> >>>>>>
> >>>>>>
> >>>>>> 2. [SQL] Which connectors do you want to use? [multichoice]:
> >>>>>> a. Kafka
> >>>>>> b. Elasticsearch
> >>>>>> ...
> >>>>>>
> >>>>>> 3. [SQL] Which catalog you want to use?
> >>>>>>
> >>>>>> ...
> >>>>>>
> >>>>>> Such a tool would download all the dependencies from maven and put
> >> them
> >>>>>> into the correct folder. In the future we can extend it with
> >> additional
> >>>>>> rules e.g. kafka-0.9 cannot be chosen at the same time with
> >>>>>> kafka-universal etc.
> >>>>>>
> >>>>>> The benefit of it would be that the distribution that we release
> could
> >>>>>> remain "slim" or we could even make it slimmer. I might be missing
> >>>>>> something here though.
> >>>>>>
> >>>>>> Best,
> >>>>>>
> >>>>>> Dawdi
> >>>>>>
> >>>>>> On 16/04/2020 11:02, Aljoscha Krettek wrote:
> >>>>>>
> >>>>>> I want to reinforce my opinion from earlier: This is about improving
> >>>>>> the situation both for first-time users and for experienced users
> that
> >>>>>> want to use a Flink dist in production. The current Flink dist is
> too
> >>>>>> "thin" for first-time SQL users and it is too "fat" for production
> >>>>>> users, that is where serving no-one properly with the current
> >>>>>> middle-ground. That's why I think introducing those specialized
> >>>>>> "spins" of Flink dist would be good.
> >>>>>>
> >>>>>> By the way, at some point in the future production users might not
> >>>>>> even need to get a Flink dist anymore. They should be able to have
> >>>>>> Flink as a dependency of their project (including the runtime) and
> >>>>>> then build an image from this for Kubernetes or a fat jar for YARN.
> >>>>>>
> >>>>>> Aljoscha
> >>>>>>
> >>>>>> On 15.04.20 18:14, wenlong.lwl wrote:
> >>>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> Regarding slim and fat distributions, I think different kinds of
> jobs
> >>>>>> may
> >>>>>> prefer different type of distribution:
> >>>>>>
> >>>>>> For DataStream job, I think we may not like fat distribution
> >>>>>>
> >>>>>> containing
> >>>>>>
> >>>>>> connectors because user would always need to depend on the connector
> >>>>>>
> >>>>>> in
> >>>>>>
> >>>>>> user code, it is easy to include the connector jar in the user lib.
> >>>>>>
> >>>>>> Less
> >>>>>>
> >>>>>> jar in lib means less class conflicts and problems.
> >>>>>>
> >>>>>> For SQL job, I think we are trying to encourage user to user pure
> >>>>>> sql(DDL +
> >>>>>> DML) to construct their job, In order to improve user experience, It
> >>>>>> may be
> >>>>>> important for flink, not only providing as many connector jar in
> >>>>>> distribution as possible especially the connector and format we have
> >>>>>> well
> >>>>>> documented,  but also providing an mechanism to load connectors
> >>>>>> according
> >>>>>> to the DDLs,
> >>>>>>
> >>>>>> So I think it could be good to place connector/format jars in some
> >>>>>> dir like
> >>>>>> opt/connector which would not affect jobs by default, and introduce
> a
> >>>>>> mechanism of dynamic discovery for SQL.
> >>>>>>
> >>>>>> Best,
> >>>>>> Wenlong
> >>>>>>
> >>>>>> On Wed, 15 Apr 2020 at 22:46, Jingsong Li <[email protected]>
> <
> >>>>> [email protected]>
> >>>>>> wrote:
> >>>>>>
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I am thinking both "improve first experience" and "improve
> production
> >>>>>> experience".
> >>>>>>
> >>>>>> I'm thinking about what's the common mode of Flink?
> >>>>>> Streaming job use Kafka? Batch job use Hive?
> >>>>>>
> >>>>>> Hive 1.2.1 dependencies can be compatible with most of Hive server
> >>>>>> versions. So Spark and Presto have built-in Hive 1.2.1 dependency.
> >>>>>> Flink is currently mainly used for streaming, so let's not talk
> >>>>>> about hive.
> >>>>>>
> >>>>>> For streaming jobs, first of all, the jobs in my mind is (related to
> >>>>>> connectors):
> >>>>>> - ETL jobs: Kafka -> Kafka
> >>>>>> - Join jobs: Kafka -> DimJDBC -> Kafka
> >>>>>> - Aggregation jobs: Kafka -> JDBCSink
> >>>>>> So Kafka and JDBC are probably the most commonly used. Of course,
> >>>>>>
> >>>>>> also
> >>>>>>
> >>>>>> includes CSV, JSON's formats.
> >>>>>> So when we provide such a fat distribution:
> >>>>>> - With CSV, JSON.
> >>>>>> - With flink-kafka-universal and kafka dependencies.
> >>>>>> - With flink-jdbc.
> >>>>>> Using this fat distribution, most users can run their jobs well.
> >>>>>>
> >>>>>> (jdbc
> >>>>>>
> >>>>>> driver jar required, but this is very natural to do)
> >>>>>> Can these dependencies lead to kinds of conflicts? Only Kafka may
> >>>>>>
> >>>>>> have
> >>>>>>
> >>>>>> conflicts, but if our goal is to use kafka-universal to support all
> >>>>>> Kafka
> >>>>>> versions, it is hopeful to target the vast majority of users.
> >>>>>>
> >>>>>> We don't want to plug all jars into the fat distribution. Only need
> >>>>>> less
> >>>>>> conflict and common. of course, it is a matter of consideration to
> >>>>>>
> >>>>>> put
> >>>>>>
> >>>>>> which jar into fat distribution.
> >>>>>> We have the opportunity to facilitate the majority of users, but
> >>>>>> also left
> >>>>>> opportunities for customization.
> >>>>>>
> >>>>>> Best,
> >>>>>> Jingsong Lee
> >>>>>>
> >>>>>> On Wed, Apr 15, 2020 at 10:09 PM Jark Wu <[email protected]> <
> >>>>> [email protected]> wrote:
> >>>>>>
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I think we should first reach an consensus on "what problem do we
> >>>>>> want to
> >>>>>> solve?"
> >>>>>> (1) improve first experience? or (2) improve production experience?
> >>>>>>
> >>>>>> As far as I can see, with the above discussion, I think what we
> >>>>>> want to
> >>>>>> solve is the "first experience".
> >>>>>> And I think the slim jar is still the best distribution for
> >>>>>> production,
> >>>>>> because it's easier to assembling jars
> >>>>>> than excluding jars and can avoid potential class conflicts.
> >>>>>>
> >>>>>> If we want to improve "first experience", I think it make sense to
> >>>>>> have a
> >>>>>> fat distribution to give users a more smooth first experience.
> >>>>>> But I would like to call it "playground distribution" or something
> >>>>>> like
> >>>>>> that to explicitly differ from the "slim production-purpose
> >>>>>>
> >>>>>> distribution".
> >>>>>>
> >>>>>> The "playground distribution" can contains some widely used jars,
> >>>>>>
> >>>>>> like
> >>>>>>
> >>>>>> universal-kafka-sql-connector, elasticsearch7-sql-connector, avro,
> >>>>>> json,
> >>>>>> csv, etc..
> >>>>>> Even we can provide a playground docker which may contain the fat
> >>>>>> distribution, python3, and hive.
> >>>>>>
> >>>>>> Best,
> >>>>>> Jark
> >>>>>>
> >>>>>>
> >>>>>> On Wed, 15 Apr 2020 at 21:47, Chesnay Schepler <[email protected]>
> <
> >>>>> [email protected]>
> >>>>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>> I don't see a lot of value in having multiple distributions.
> >>>>>>
> >>>>>> The simple reality is that no fat distribution we could provide
> >>>>>>
> >>>>>> would
> >>>>>>
> >>>>>> satisfy all use-cases, so why even try.
> >>>>>> If users commonly run into issues for certain jars, then maybe
> >>>>>>
> >>>>>> those
> >>>>>>
> >>>>>> should be added to the current distribution.
> >>>>>>
> >>>>>> Personally though I still believe we should only distribute a slim
> >>>>>> version. I'd rather have users always add required jars to the
> >>>>>> distribution than only when they go outside our "expected"
> >>>>>>
> >>>>>> use-cases.
> >>>>>>
> >>>>>> Then we might finally address this issue properly, i.e., tooling to
> >>>>>> assemble custom distributions and/or better error messages if
> >>>>>> Flink-provided extensions cannot be found.
> >>>>>>
> >>>>>> On 15/04/2020 15:23, Kurt Young wrote:
> >>>>>>
> >>>>>> Regarding to the specific solution, I'm not sure about the "fat"
> >>>>>>
> >>>>>> and
> >>>>>>
> >>>>>> "slim"
> >>>>>>
> >>>>>> solution though. I get the idea
> >>>>>> that we can make the slim one even more lightweight than current
> >>>>>> distribution, but what about the "fat"
> >>>>>> one? Do you mean that we would package all connectors and formats
> >>>>>>
> >>>>>> into
> >>>>>>
> >>>>>> this? I'm not sure if this is
> >>>>>> feasible. For example, we can't put all versions of kafka and hive
> >>>>>> connector jars into lib directory, and
> >>>>>> we also might need hadoop jars when using filesystem connector to
> >>>>>>
> >>>>>> access
> >>>>>>
> >>>>>> data from HDFS.
> >>>>>>
> >>>>>> So my guess would be we might hand-pick some of the most
> >>>>>>
> >>>>>> frequently
> >>>>>>
> >>>>>> used
> >>>>>>
> >>>>>> connectors and formats
> >>>>>> into our "lib" directory, like kafka, csv, json metioned above,
> >>>>>>
> >>>>>> and
> >>>>>>
> >>>>>> still
> >>>>>>
> >>>>>> leave some other connectors out of it.
> >>>>>> If this is the case, then why not we just provide this
> >>>>>>
> >>>>>> distribution
> >>>>>>
> >>>>>> to
> >>>>>>
> >>>>>> user? I'm not sure i get the benefit of
> >>>>>> providing another super "slim" jar (we have to pay some costs to
> >>>>>>
> >>>>>> provide
> >>>>>>
> >>>>>> another suit of distribution).
> >>>>>>
> >>>>>> What do you think?
> >>>>>>
> >>>>>> Best,
> >>>>>> Kurt
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Apr 15, 2020 at 7:08 PM Jingsong Li <
> >>>>>>
> >>>>>> [email protected]
> >>>>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Big +1.
> >>>>>>
> >>>>>> I like "fat" and "slim".
> >>>>>>
> >>>>>> For csv and json, like Jark said, they are quite small and don't
> >>>>>>
> >>>>>> have
> >>>>>>
> >>>>>> other
> >>>>>>
> >>>>>> dependencies. They are important to kafka connector, and
> >>>>>>
> >>>>>> important
> >>>>>>
> >>>>>> to upcoming file system connector too.
> >>>>>> So can we move them to both "fat" and "slim"? They're so
> >>>>>>
> >>>>>> important,
> >>>>>>
> >>>>>> and
> >>>>>>
> >>>>>> they're so lightweight.
> >>>>>>
> >>>>>> Best,
> >>>>>> Jingsong Lee
> >>>>>>
> >>>>>> On Wed, Apr 15, 2020 at 4:53 PM godfrey he <[email protected]> <
> >>>>> [email protected]>
> >>>>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Big +1.
> >>>>>> This will improve user experience (special for Flink new users).
> >>>>>> We answered so many questions about "class not found".
> >>>>>>
> >>>>>> Best,
> >>>>>> Godfrey
> >>>>>>
> >>>>>> Dian Fu <[email protected]> <[email protected]>
> 于2020年4月15日周三
> >>>>> 下午4:30写道：
> >>>>>>
> >>>>>>
> >>>>>> +1 to this proposal.
> >>>>>>
> >>>>>> Missing connector jars is also a big problem for PyFlink users.
> >>>>>>
> >>>>>> Currently,
> >>>>>>
> >>>>>> after a Python user has installed PyFlink using `pip`, he has
> >>>>>>
> >>>>>> to
> >>>>>>
> >>>>>> manually
> >>>>>>
> >>>>>> copy the connector fat jars to the PyFlink installation
> >>>>>>
> >>>>>> directory
> >>>>>>
> >>>>>> for
> >>>>>>
> >>>>>> the
> >>>>>>
> >>>>>> connectors to be used if he wants to run jobs locally. This
> >>>>>>
> >>>>>> process
> >>>>>>
> >>>>>> is
> >>>>>>
> >>>>>> very
> >>>>>>
> >>>>>> confuse for users and affects the experience a lot.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Dian
> >>>>>>
> >>>>>>
> >>>>>> 在 2020年4月15日，下午3:51，Jark Wu <[email protected]> <[email protected]>
> 写道：
> >>>>>>
> >>>>>> +1 to the proposal. I also found the "download additional jar"
> >>>>>>
> >>>>>> step
> >>>>>>
> >>>>>> is
> >>>>>>
> >>>>>> really verbose when I prepare webinars.
> >>>>>>
> >>>>>> At least, I think the flink-csv and flink-json should in the
> >>>>>>
> >>>>>> distribution,
> >>>>>>
> >>>>>> they are quite small and don't have other dependencies.
> >>>>>>
> >>>>>> Best,
> >>>>>> Jark
> >>>>>>
> >>>>>> On Wed, 15 Apr 2020 at 15:44, Jeff Zhang <[email protected]> <
> >>>>> [email protected]>
> >>>>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Hi Aljoscha,
> >>>>>>
> >>>>>> Big +1 for the fat flink distribution, where do you plan to
> >>>>>>
> >>>>>> put
> >>>>>>
> >>>>>> these
> >>>>>>
> >>>>>> connectors ? opt or lib ?
> >>>>>>
> >>>>>> Aljoscha Krettek <[email protected]> <[email protected]>
> >>>>> 于2020年4月15日周三
> >>>>>> 下午3:30写道：
> >>>>>>
> >>>>>>
> >>>>>> Hi Everyone,
> >>>>>>
> >>>>>> I'd like to discuss about releasing a more full-featured
> >>>>>>
> >>>>>> Flink
> >>>>>>
> >>>>>> distribution. The motivation is that there is friction for
> >>>>>>
> >>>>>> SQL/Table
> >>>>>>
> >>>>>> API
> >>>>>>
> >>>>>> users that want to use Table connectors which are not there
> >>>>>>
> >>>>>> in
> >>>>>>
> >>>>>> the
> >>>>>>
> >>>>>> current Flink Distribution. For these users the workflow is
> >>>>>>
> >>>>>> currently
> >>>>>>
> >>>>>> roughly:
> >>>>>>
> >>>>>>      - download Flink dist
> >>>>>>      - configure csv/Kafka/json connectors per configuration
> >>>>>>      - run SQL client or program
> >>>>>>      - decrypt error message and research the solution
> >>>>>>      - download additional connector jars
> >>>>>>      - program works correctly
> >>>>>>
> >>>>>> I realize that this can be made to work but if every SQL
> >>>>>>
> >>>>>> user
> >>>>>>
> >>>>>> has
> >>>>>>
> >>>>>> this
> >>>>>>
> >>>>>> as their first experience that doesn't seem good to me.
> >>>>>>
> >>>>>> My proposal is to provide two versions of the Flink
> >>>>>>
> >>>>>> Distribution
> >>>>>>
> >>>>>> in
> >>>>>>
> >>>>>> the
> >>>>>>
> >>>>>> future: "fat" and "slim" (names to be discussed):
> >>>>>>
> >>>>>>      - slim would be even trimmer than todays distribution
> >>>>>>      - fat would contain a lot of convenience connectors (yet
> >>>>>>
> >>>>>> to
> >>>>>>
> >>>>>> be
> >>>>>>
> >>>>>> determined which one)
> >>>>>>
> >>>>>> And yes, I realize that there are already more dimensions of
> >>>>>>
> >>>>>> Flink
> >>>>>>
> >>>>>> releases (Scala version and Java version).
> >>>>>>
> >>>>>> For background, our current Flink dist has these in the opt
> >>>>>>
> >>>>>> directory:
> >>>>>>
> >>>>>>      - flink-azure-fs-hadoop-1.10.0.jar
> >>>>>>      - flink-cep-scala_2.12-1.10.0.jar
> >>>>>>      - flink-cep_2.12-1.10.0.jar
> >>>>>>      - flink-gelly-scala_2.12-1.10.0.jar
> >>>>>>      - flink-gelly_2.12-1.10.0.jar
> >>>>>>      - flink-metrics-datadog-1.10.0.jar
> >>>>>>      - flink-metrics-graphite-1.10.0.jar
> >>>>>>      - flink-metrics-influxdb-1.10.0.jar
> >>>>>>      - flink-metrics-prometheus-1.10.0.jar
> >>>>>>      - flink-metrics-slf4j-1.10.0.jar
> >>>>>>      - flink-metrics-statsd-1.10.0.jar
> >>>>>>      - flink-oss-fs-hadoop-1.10.0.jar
> >>>>>>      - flink-python_2.12-1.10.0.jar
> >>>>>>      - flink-queryable-state-runtime_2.12-1.10.0.jar
> >>>>>>      - flink-s3-fs-hadoop-1.10.0.jar
> >>>>>>      - flink-s3-fs-presto-1.10.0.jar
> >>>>>>      -
> >>>>>>
> >>>>>> flink-shaded-netty-tcnative-dynamic-2.0.25.Final-9.0.jar
> >>>>>>
> >>>>>>      - flink-sql-client_2.12-1.10.0.jar
> >>>>>>      - flink-state-processor-api_2.12-1.10.0.jar
> >>>>>>      - flink-swift-fs-hadoop-1.10.0.jar
> >>>>>>
> >>>>>> Current Flink dist is 267M. If we removed everything from
> >>>>>>
> >>>>>> opt
> >>>>>>
> >>>>>> we
> >>>>>>
> >>>>>> would
> >>>>>>
> >>>>>> go down to 126M. I would reccomend this, because the large
> >>>>>>
> >>>>>> majority
> >>>>>>
> >>>>>> of
> >>>>>>
> >>>>>> the files in opt are probably unused.
> >>>>>>
> >>>>>> What do you think?
> >>>>>>
> >>>>>> Best,
> >>>>>> Aljoscha
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Best Regards
> >>>>>>
> >>>>>> Jeff Zhang
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Best, Jingsong Lee
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Best, Jingsong Lee
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
> >
>
>

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

Reply via email to