Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

Chesnay Schepler Wed, 15 Apr 2020 06:48:11 -0700

I don't see a lot of value in having multiple distributions.

The simple reality is that no fat distribution we could provide wouldsatisfy all use-cases, so why even try.If users commonly run into issues for certain jars, then maybe thoseshould be added to the current distribution.

Personally though I still believe we should only distribute a slimversion. I'd rather have users always add required jars to thedistribution than only when they go outside our "expected" use-cases.Then we might finally address this issue properly, i.e., tooling toassemble custom distributions and/or better error messages ifFlink-provided extensions cannot be found.


On 15/04/2020 15:23, Kurt Young wrote:

Regarding to the specific solution, I'm not sure about the "fat" and "slim"
solution though. I get the idea
that we can make the slim one even more lightweight than current
distribution, but what about the "fat"
one? Do you mean that we would package all connectors and formats into
this? I'm not sure if this is
feasible. For example, we can't put all versions of kafka and hive
connector jars into lib directory, and
we also might need hadoop jars when using filesystem connector to access
data from HDFS.

So my guess would be we might hand-pick some of the most frequently used
connectors and formats
into our "lib" directory, like kafka, csv, json metioned above, and still
leave some other connectors out of it.
If this is the case, then why not we just provide this distribution to
user? I'm not sure i get the benefit of
providing another super "slim" jar (we have to pay some costs to provide
another suit of distribution).

What do you think?

Best,
Kurt


On Wed, Apr 15, 2020 at 7:08 PM Jingsong Li <[email protected]> wrote:

Big +1.

I like "fat" and "slim".

For csv and json, like Jark said, they are quite small and don't have other
dependencies. They are important to kafka connector, and important
to upcoming file system connector too.
So can we move them to both "fat" and "slim"? They're so important, and
they're so lightweight.

Best,
Jingsong Lee

On Wed, Apr 15, 2020 at 4:53 PM godfrey he <[email protected]> wrote:

Big +1.
This will improve user experience (special for Flink new users).
We answered so many questions about "class not found".

Best,
Godfrey

Dian Fu <[email protected]> 于2020年4月15日周三 下午4:30写道：

+1 to this proposal.

Missing connector jars is also a big problem for PyFlink users.

Currently,

after a Python user has installed PyFlink using `pip`, he has to

manually

copy the connector fat jars to the PyFlink installation directory for

the

connectors to be used if he wants to run jobs locally. This process is

very

confuse for users and affects the experience a lot.

Regards,
Dian

在 2020年4月15日，下午3:51，Jark Wu <[email protected]> 写道：

+1 to the proposal. I also found the "download additional jar" step

is

really verbose when I prepare webinars.

At least, I think the flink-csv and flink-json should in the

distribution,

they are quite small and don't have other dependencies.

Best,
Jark

On Wed, 15 Apr 2020 at 15:44, Jeff Zhang <[email protected]> wrote:

Hi Aljoscha,

Big +1 for the fat flink distribution, where do you plan to put

these

connectors ? opt or lib ?

Aljoscha Krettek <[email protected]> 于2020年4月15日周三 下午3:30写道：

Hi Everyone,

I'd like to discuss about releasing a more full-featured Flink
distribution. The motivation is that there is friction for

SQL/Table

API

users that want to use Table connectors which are not there in the
current Flink Distribution. For these users the workflow is

currently

roughly:

  - download Flink dist
  - configure csv/Kafka/json connectors per configuration
  - run SQL client or program
  - decrypt error message and research the solution
  - download additional connector jars
  - program works correctly

I realize that this can be made to work but if every SQL user has

this

as their first experience that doesn't seem good to me.

My proposal is to provide two versions of the Flink Distribution in

the

future: "fat" and "slim" (names to be discussed):

  - slim would be even trimmer than todays distribution
  - fat would contain a lot of convenience connectors (yet to be
determined which one)

And yes, I realize that there are already more dimensions of Flink
releases (Scala version and Java version).

For background, our current Flink dist has these in the opt

directory:

  - flink-azure-fs-hadoop-1.10.0.jar
  - flink-cep-scala_2.12-1.10.0.jar
  - flink-cep_2.12-1.10.0.jar
  - flink-gelly-scala_2.12-1.10.0.jar
  - flink-gelly_2.12-1.10.0.jar
  - flink-metrics-datadog-1.10.0.jar
  - flink-metrics-graphite-1.10.0.jar
  - flink-metrics-influxdb-1.10.0.jar
  - flink-metrics-prometheus-1.10.0.jar
  - flink-metrics-slf4j-1.10.0.jar
  - flink-metrics-statsd-1.10.0.jar
  - flink-oss-fs-hadoop-1.10.0.jar
  - flink-python_2.12-1.10.0.jar
  - flink-queryable-state-runtime_2.12-1.10.0.jar
  - flink-s3-fs-hadoop-1.10.0.jar
  - flink-s3-fs-presto-1.10.0.jar
  - flink-shaded-netty-tcnative-dynamic-2.0.25.Final-9.0.jar
  - flink-sql-client_2.12-1.10.0.jar
  - flink-state-processor-api_2.12-1.10.0.jar
  - flink-swift-fs-hadoop-1.10.0.jar

Current Flink dist is 267M. If we removed everything from opt we

would

go down to 126M. I would reccomend this, because the large majority

of

the files in opt are probably unused.

What do you think?

Best,
Aljoscha

--
Best Regards

Jeff Zhang


--
Best, Jingsong Lee

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

Reply via email to