Re: [DISCUS] Flink SQL Client dependency management

Xingcan Cui Tue, 27 Feb 2018 05:16:50 -0800

Hi Timo,

thanks for your efforts. Personally, I think the second option would be better 
and here are my feelings.


(1) The SQL client is designed to offer a convenient way for users to 
manipulate data with Flink. Obviously, the second option would be more 
easy-to-use. 

(2) The script will help to manage the dependencies automatically, but with 
less flexibility. Once the script cannot meet the need, users have to modify it 
themselves. 

(3) I wonder whether we could package all these built-in connectors and formats 
into a single JAR. With this all-in-one solution, users don’t need to consider 
much about the dependencies.

Best,
Xingcan

> On 27 Feb 2018, at 6:38 PM, Stephan Ewen <se...@apache.org> wrote:
> 
> My first intuition would be to go for approach #2 for the following reasons
> 
> - I expect that in the long run, the scripts will not be that simple to
> maintain. We saw that with all shell scripts thus far: they start simple,
> and then grow with many special cases for this and that setup.
> 
> - Not all users have Maven, automatically downloading and configuring
> Maven could be an option, but that makes the scripts yet more tricky.
> 
> - Download-and-drop-in is probably still easier to understand for users
> than the syntax of a script with its parameters
> 
> - I think it may actually be even simpler to maintain for us, because all
> it does is add a profile or build target to each connector to also create
> the fat jar.
> 
> - Storage space is no longer really a problem. Worst case we host the fat
> jars in an S3 bucket.
> 
> 
> On Mon, Feb 26, 2018 at 7:33 PM, Timo Walther <twal...@apache.org> wrote:
> 
>> Hi everyone,
>> 
>> as you may know a first minimum version of FLIP-24 [1] for the upcoming
>> Flink SQL Client has been merged to the master. We also merged
>> possibilities to discover and configure table sources without a single line
>> of code using string-based properties [2] and Java service provider
>> discovery.
>> 
>> We are now facing the issue of how to manage dependencies in this new
>> environment. It is different from how regular Flink projects are created
>> (by setting up a a new Maven project and build a jar or fat jar). Ideally,
>> a user should be able to select from a set of prepared connectors,
>> catalogs, and formats. E.g., if a Kafka connector and Avro format is
>> needed, all that should be required is to move a "flink-kafka.jar" and
>> "flink-avro.jar" into the "sql_lib" directory that is shipped to a Flink
>> cluster together with the SQL query.
>> 
>> The question is how do we want to offer those JAR files in the future? We
>> see two options:
>> 
>> 1) We prepare Maven build profiles for all offered modules and provide a
>> shell script for building fat jars. A script call could look like
>> "./sql-client-dependency.sh kafka 0.10". It would automatically download
>> what is needed and place the JAR file in the library folder. This approach
>> would keep our development effort low but would require Maven to be present
>> and builds to pass on different environments (e.g. Windows).
>> 
>> 2) We build fat jars for these modules with every Flink release that can
>> be hostet somewhere (e.g. Apache infrastructure, but not Maven central).
>> This would make it very easy to add a dependency by downloading the
>> prepared JAR files. However, it would require to build and host large fat
>> jars for every connector (and version) with every Flink major and minor
>> release. The size of such a repository might grow quickly.
>> 
>> What do you think? Do you see other options to make adding dependencies as
>> possible?
>> 
>> 
>> Regards,
>> 
>> Timo
>> 
>> 
>> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
>> 
>> [2] https://issues.apache.org/jira/browse/FLINK-8240
>> 
>>

Re: [DISCUS] Flink SQL Client dependency management

Reply via email to