Re: [DISCUS] Flink SQL Client dependency management

Timo Walther Fri, 02 Mar 2018 00:46:14 -0800

Hi everyone,

thanks for your opinions. So the majority voted for option (2) fat jarsthat are ready to be used. I will create an Jira issue and prepare theinfrastructure for the first connector and first format.


Regards,
Timo

Am 3/1/18 um 11:38 AM schrieb Fabian Hueske:

I agree, option (2) would be the easiest approach for the users.


2018-03-01 0:00 GMT+01:00 Rong Rong <[email protected]>:

Hi Timo,

Thanks for the initiating the SQL client effort. I agree with Xingcan's
points, adding to it (1) most of the user for SQL client would very likely
to have little Maven / build tool knowledge and (2) most likely the build
script would grow much complex in the future that makes it exponentially
hard for user to modify themselves.

On (3) the single "fat" jar idea, adding on to the dependency conflict
issue, another very common way I see is that users often want to maintain a
list of individual jars, such as a list of relatively-constant, handy UDFs
every time using the SQL client. They will probably need to package and
ship separately anyway. I was wondering if "download-and-drop-in" might be
a more straight forward approach?

Best,
Rong

On Tue, Feb 27, 2018 at 8:23 AM, Stephan Ewen <[email protected]> wrote:

I think one problem with the "one fat jar for all" is that some
dependencies clash in the classnames across versions:
   - Kafka 0.9, 0.10, 0.11, 1.0
   - Elasticsearch 2, 4, and 5

There are probably others as well...

On Tue, Feb 27, 2018 at 2:57 PM, Timo Walther <[email protected]>

wrote:

Hi Xingcan,

thank you for your feedback. Regarding (3) we also thought about that

but

this approach would not scale very well. Given that we might have fat

jars

for multiple versions (Kafka 0.8, Kafka 0.6 etc.) such an all-in-one
solution JAR file might easily go beyond 1 or 2 GB. I don't know if

users

want to download that just for a combination of connector and format.

Timo


Am 2/27/18 um 2:16 PM schrieb Xingcan Cui:

Hi Timo,

thanks for your efforts. Personally, I think the second option would

be

better and here are my feelings.

(1) The SQL client is designed to offer a convenient way for users to
manipulate data with Flink. Obviously, the second option would be more
easy-to-use.

(2) The script will help to manage the dependencies automatically, but
with less flexibility. Once the script cannot meet the need, users

have

to

modify it themselves.

(3) I wonder whether we could package all these built-in connectors

and

formats into a single JAR. With this all-in-one solution, users don’t

need

to consider much about the dependencies.

Best,
Xingcan

On 27 Feb 2018, at 6:38 PM, Stephan Ewen <[email protected]> wrote:

My first intuition would be to go for approach #2 for the following
reasons

- I expect that in the long run, the scripts will not be that simple

to

maintain. We saw that with all shell scripts thus far: they start

simple,

and then grow with many special cases for this and that setup.

- Not all users have Maven, automatically downloading and configuring
Maven could be an option, but that makes the scripts yet more tricky.

- Download-and-drop-in is probably still easier to understand for

users

than the syntax of a script with its parameters

- I think it may actually be even simpler to maintain for us, because

all

it does is add a profile or build target to each connector to also

create

the fat jar.

- Storage space is no longer really a problem. Worst case we host the

fat

jars in an S3 bucket.


On Mon, Feb 26, 2018 at 7:33 PM, Timo Walther <[email protected]>
wrote:

Hi everyone,

as you may know a first minimum version of FLIP-24 [1] for the

upcoming

Flink SQL Client has been merged to the master. We also merged
possibilities to discover and configure table sources without a

single

line
of code using string-based properties [2] and Java service provider
discovery.

We are now facing the issue of how to manage dependencies in this

new

environment. It is different from how regular Flink projects are

created

(by setting up a a new Maven project and build a jar or fat jar).
Ideally,
a user should be able to select from a set of prepared connectors,
catalogs, and formats. E.g., if a Kafka connector and Avro format is
needed, all that should be required is to move a "flink-kafka.jar"

and

"flink-avro.jar" into the "sql_lib" directory that is shipped to a

Flink

cluster together with the SQL query.

The question is how do we want to offer those JAR files in the

future?

We
see two options:

1) We prepare Maven build profiles for all offered modules and

provide a

shell script for building fat jars. A script call could look like
"./sql-client-dependency.sh kafka 0.10". It would automatically

download

what is needed and place the JAR file in the library folder. This
approach
would keep our development effort low but would require Maven to be
present
and builds to pass on different environments (e.g. Windows).

2) We build fat jars for these modules with every Flink release that

can

be hostet somewhere (e.g. Apache infrastructure, but not Maven

central).

This would make it very easy to add a dependency by downloading the
prepared JAR files. However, it would require to build and host

large

fat
jars for every connector (and version) with every Flink major and

minor

release. The size of such a repository might grow quickly.

What do you think? Do you see other options to make adding

dependencies

as
possible?


Regards,

Timo


[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+
SQL+Client

[2] https://issues.apache.org/jira/browse/FLINK-8240

Re: [DISCUS] Flink SQL Client dependency management

Reply via email to