Hi Xingcan,
thank you for your feedback. Regarding (3) we also thought about that
but this approach would not scale very well. Given that we might have
fat jars for multiple versions (Kafka 0.8, Kafka 0.6 etc.) such an
all-in-one solution JAR file might easily go beyond 1 or 2 GB. I don't
know if users want to download that just for a combination of connector
and format.
Timo
Am 2/27/18 um 2:16 PM schrieb Xingcan Cui:
Hi Timo,
thanks for your efforts. Personally, I think the second option would be better
and here are my feelings.
(1) The SQL client is designed to offer a convenient way for users to
manipulate data with Flink. Obviously, the second option would be more
easy-to-use.
(2) The script will help to manage the dependencies automatically, but with
less flexibility. Once the script cannot meet the need, users have to modify it
themselves.
(3) I wonder whether we could package all these built-in connectors and formats
into a single JAR. With this all-in-one solution, users don’t need to consider
much about the dependencies.
Best,
Xingcan
On 27 Feb 2018, at 6:38 PM, Stephan Ewen <se...@apache.org> wrote:
My first intuition would be to go for approach #2 for the following reasons
- I expect that in the long run, the scripts will not be that simple to
maintain. We saw that with all shell scripts thus far: they start simple,
and then grow with many special cases for this and that setup.
- Not all users have Maven, automatically downloading and configuring
Maven could be an option, but that makes the scripts yet more tricky.
- Download-and-drop-in is probably still easier to understand for users
than the syntax of a script with its parameters
- I think it may actually be even simpler to maintain for us, because all
it does is add a profile or build target to each connector to also create
the fat jar.
- Storage space is no longer really a problem. Worst case we host the fat
jars in an S3 bucket.
On Mon, Feb 26, 2018 at 7:33 PM, Timo Walther <twal...@apache.org> wrote:
Hi everyone,
as you may know a first minimum version of FLIP-24 [1] for the upcoming
Flink SQL Client has been merged to the master. We also merged
possibilities to discover and configure table sources without a single line
of code using string-based properties [2] and Java service provider
discovery.
We are now facing the issue of how to manage dependencies in this new
environment. It is different from how regular Flink projects are created
(by setting up a a new Maven project and build a jar or fat jar). Ideally,
a user should be able to select from a set of prepared connectors,
catalogs, and formats. E.g., if a Kafka connector and Avro format is
needed, all that should be required is to move a "flink-kafka.jar" and
"flink-avro.jar" into the "sql_lib" directory that is shipped to a Flink
cluster together with the SQL query.
The question is how do we want to offer those JAR files in the future? We
see two options:
1) We prepare Maven build profiles for all offered modules and provide a
shell script for building fat jars. A script call could look like
"./sql-client-dependency.sh kafka 0.10". It would automatically download
what is needed and place the JAR file in the library folder. This approach
would keep our development effort low but would require Maven to be present
and builds to pass on different environments (e.g. Windows).
2) We build fat jars for these modules with every Flink release that can
be hostet somewhere (e.g. Apache infrastructure, but not Maven central).
This would make it very easy to add a dependency by downloading the
prepared JAR files. However, it would require to build and host large fat
jars for every connector (and version) with every Flink major and minor
release. The size of such a repository might grow quickly.
What do you think? Do you see other options to make adding dependencies as
possible?
Regards,
Timo
[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
[2] https://issues.apache.org/jira/browse/FLINK-8240