I think one problem with the "one fat jar for all" is that some dependencies clash in the classnames across versions: - Kafka 0.9, 0.10, 0.11, 1.0 - Elasticsearch 2, 4, and 5
There are probably others as well... On Tue, Feb 27, 2018 at 2:57 PM, Timo Walther <twal...@apache.org> wrote: > Hi Xingcan, > > thank you for your feedback. Regarding (3) we also thought about that but > this approach would not scale very well. Given that we might have fat jars > for multiple versions (Kafka 0.8, Kafka 0.6 etc.) such an all-in-one > solution JAR file might easily go beyond 1 or 2 GB. I don't know if users > want to download that just for a combination of connector and format. > > Timo > > > Am 2/27/18 um 2:16 PM schrieb Xingcan Cui: > > Hi Timo, >> >> thanks for your efforts. Personally, I think the second option would be >> better and here are my feelings. >> >> (1) The SQL client is designed to offer a convenient way for users to >> manipulate data with Flink. Obviously, the second option would be more >> easy-to-use. >> >> (2) The script will help to manage the dependencies automatically, but >> with less flexibility. Once the script cannot meet the need, users have to >> modify it themselves. >> >> (3) I wonder whether we could package all these built-in connectors and >> formats into a single JAR. With this all-in-one solution, users don’t need >> to consider much about the dependencies. >> >> Best, >> Xingcan >> >> On 27 Feb 2018, at 6:38 PM, Stephan Ewen <se...@apache.org> wrote: >>> >>> My first intuition would be to go for approach #2 for the following >>> reasons >>> >>> - I expect that in the long run, the scripts will not be that simple to >>> maintain. We saw that with all shell scripts thus far: they start simple, >>> and then grow with many special cases for this and that setup. >>> >>> - Not all users have Maven, automatically downloading and configuring >>> Maven could be an option, but that makes the scripts yet more tricky. >>> >>> - Download-and-drop-in is probably still easier to understand for users >>> than the syntax of a script with its parameters >>> >>> - I think it may actually be even simpler to maintain for us, because all >>> it does is add a profile or build target to each connector to also create >>> the fat jar. >>> >>> - Storage space is no longer really a problem. Worst case we host the fat >>> jars in an S3 bucket. >>> >>> >>> On Mon, Feb 26, 2018 at 7:33 PM, Timo Walther <twal...@apache.org> >>> wrote: >>> >>> Hi everyone, >>>> >>>> as you may know a first minimum version of FLIP-24 [1] for the upcoming >>>> Flink SQL Client has been merged to the master. We also merged >>>> possibilities to discover and configure table sources without a single >>>> line >>>> of code using string-based properties [2] and Java service provider >>>> discovery. >>>> >>>> We are now facing the issue of how to manage dependencies in this new >>>> environment. It is different from how regular Flink projects are created >>>> (by setting up a a new Maven project and build a jar or fat jar). >>>> Ideally, >>>> a user should be able to select from a set of prepared connectors, >>>> catalogs, and formats. E.g., if a Kafka connector and Avro format is >>>> needed, all that should be required is to move a "flink-kafka.jar" and >>>> "flink-avro.jar" into the "sql_lib" directory that is shipped to a Flink >>>> cluster together with the SQL query. >>>> >>>> The question is how do we want to offer those JAR files in the future? >>>> We >>>> see two options: >>>> >>>> 1) We prepare Maven build profiles for all offered modules and provide a >>>> shell script for building fat jars. A script call could look like >>>> "./sql-client-dependency.sh kafka 0.10". It would automatically download >>>> what is needed and place the JAR file in the library folder. This >>>> approach >>>> would keep our development effort low but would require Maven to be >>>> present >>>> and builds to pass on different environments (e.g. Windows). >>>> >>>> 2) We build fat jars for these modules with every Flink release that can >>>> be hostet somewhere (e.g. Apache infrastructure, but not Maven central). >>>> This would make it very easy to add a dependency by downloading the >>>> prepared JAR files. However, it would require to build and host large >>>> fat >>>> jars for every connector (and version) with every Flink major and minor >>>> release. The size of such a repository might grow quickly. >>>> >>>> What do you think? Do you see other options to make adding dependencies >>>> as >>>> possible? >>>> >>>> >>>> Regards, >>>> >>>> Timo >>>> >>>> >>>> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+ >>>> SQL+Client >>>> >>>> [2] https://issues.apache.org/jira/browse/FLINK-8240 >>>> >>>> >>>> >