Re: [DISCUS] Flink SQL Client dependency management

Stephan Ewen Tue, 27 Feb 2018 08:24:23 -0800

I think one problem with the "one fat jar for all" is that some
dependencies clash in the classnames across versions:
  - Kafka 0.9, 0.10, 0.11, 1.0
  - Elasticsearch 2, 4, and 5


There are probably others as well...

On Tue, Feb 27, 2018 at 2:57 PM, Timo Walther <twal...@apache.org> wrote:

> Hi Xingcan,
>
> thank you for your feedback. Regarding (3) we also thought about that but
> this approach would not scale very well. Given that we might have fat jars
> for multiple versions (Kafka 0.8, Kafka 0.6 etc.) such an all-in-one
> solution JAR file might easily go beyond 1 or 2 GB. I don't know if users
> want to download that just for a combination of connector and format.
>
> Timo
>
>
> Am 2/27/18 um 2:16 PM schrieb Xingcan Cui:
>
> Hi Timo,
>>
>> thanks for your efforts. Personally, I think the second option would be
>> better and here are my feelings.
>>
>> (1) The SQL client is designed to offer a convenient way for users to
>> manipulate data with Flink. Obviously, the second option would be more
>> easy-to-use.
>>
>> (2) The script will help to manage the dependencies automatically, but
>> with less flexibility. Once the script cannot meet the need, users have to
>> modify it themselves.
>>
>> (3) I wonder whether we could package all these built-in connectors and
>> formats into a single JAR. With this all-in-one solution, users don’t need
>> to consider much about the dependencies.
>>
>> Best,
>> Xingcan
>>
>> On 27 Feb 2018, at 6:38 PM, Stephan Ewen <se...@apache.org> wrote:
>>>
>>> My first intuition would be to go for approach #2 for the following
>>> reasons
>>>
>>> - I expect that in the long run, the scripts will not be that simple to
>>> maintain. We saw that with all shell scripts thus far: they start simple,
>>> and then grow with many special cases for this and that setup.
>>>
>>> - Not all users have Maven, automatically downloading and configuring
>>> Maven could be an option, but that makes the scripts yet more tricky.
>>>
>>> - Download-and-drop-in is probably still easier to understand for users
>>> than the syntax of a script with its parameters
>>>
>>> - I think it may actually be even simpler to maintain for us, because all
>>> it does is add a profile or build target to each connector to also create
>>> the fat jar.
>>>
>>> - Storage space is no longer really a problem. Worst case we host the fat
>>> jars in an S3 bucket.
>>>
>>>
>>> On Mon, Feb 26, 2018 at 7:33 PM, Timo Walther <twal...@apache.org>
>>> wrote:
>>>
>>> Hi everyone,
>>>>
>>>> as you may know a first minimum version of FLIP-24 [1] for the upcoming
>>>> Flink SQL Client has been merged to the master. We also merged
>>>> possibilities to discover and configure table sources without a single
>>>> line
>>>> of code using string-based properties [2] and Java service provider
>>>> discovery.
>>>>
>>>> We are now facing the issue of how to manage dependencies in this new
>>>> environment. It is different from how regular Flink projects are created
>>>> (by setting up a a new Maven project and build a jar or fat jar).
>>>> Ideally,
>>>> a user should be able to select from a set of prepared connectors,
>>>> catalogs, and formats. E.g., if a Kafka connector and Avro format is
>>>> needed, all that should be required is to move a "flink-kafka.jar" and
>>>> "flink-avro.jar" into the "sql_lib" directory that is shipped to a Flink
>>>> cluster together with the SQL query.
>>>>
>>>> The question is how do we want to offer those JAR files in the future?
>>>> We
>>>> see two options:
>>>>
>>>> 1) We prepare Maven build profiles for all offered modules and provide a
>>>> shell script for building fat jars. A script call could look like
>>>> "./sql-client-dependency.sh kafka 0.10". It would automatically download
>>>> what is needed and place the JAR file in the library folder. This
>>>> approach
>>>> would keep our development effort low but would require Maven to be
>>>> present
>>>> and builds to pass on different environments (e.g. Windows).
>>>>
>>>> 2) We build fat jars for these modules with every Flink release that can
>>>> be hostet somewhere (e.g. Apache infrastructure, but not Maven central).
>>>> This would make it very easy to add a dependency by downloading the
>>>> prepared JAR files. However, it would require to build and host large
>>>> fat
>>>> jars for every connector (and version) with every Flink major and minor
>>>> release. The size of such a repository might grow quickly.
>>>>
>>>> What do you think? Do you see other options to make adding dependencies
>>>> as
>>>> possible?
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Timo
>>>>
>>>>
>>>> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+
>>>> SQL+Client
>>>>
>>>> [2] https://issues.apache.org/jira/browse/FLINK-8240
>>>>
>>>>
>>>>
>

Re: [DISCUS] Flink SQL Client dependency management

Reply via email to