I agree, option (2) would be the easiest approach for the users.
2018-03-01 0:00 GMT+01:00 Rong Rong <walter...@gmail.com>: > Hi Timo, > > Thanks for the initiating the SQL client effort. I agree with Xingcan's > points, adding to it (1) most of the user for SQL client would very likely > to have little Maven / build tool knowledge and (2) most likely the build > script would grow much complex in the future that makes it exponentially > hard for user to modify themselves. > > On (3) the single "fat" jar idea, adding on to the dependency conflict > issue, another very common way I see is that users often want to maintain a > list of individual jars, such as a list of relatively-constant, handy UDFs > every time using the SQL client. They will probably need to package and > ship separately anyway. I was wondering if "download-and-drop-in" might be > a more straight forward approach? > > Best, > Rong > > On Tue, Feb 27, 2018 at 8:23 AM, Stephan Ewen <se...@apache.org> wrote: > > > I think one problem with the "one fat jar for all" is that some > > dependencies clash in the classnames across versions: > > - Kafka 0.9, 0.10, 0.11, 1.0 > > - Elasticsearch 2, 4, and 5 > > > > There are probably others as well... > > > > On Tue, Feb 27, 2018 at 2:57 PM, Timo Walther <twal...@apache.org> > wrote: > > > > > Hi Xingcan, > > > > > > thank you for your feedback. Regarding (3) we also thought about that > but > > > this approach would not scale very well. Given that we might have fat > > jars > > > for multiple versions (Kafka 0.8, Kafka 0.6 etc.) such an all-in-one > > > solution JAR file might easily go beyond 1 or 2 GB. I don't know if > users > > > want to download that just for a combination of connector and format. > > > > > > Timo > > > > > > > > > Am 2/27/18 um 2:16 PM schrieb Xingcan Cui: > > > > > > Hi Timo, > > >> > > >> thanks for your efforts. Personally, I think the second option would > be > > >> better and here are my feelings. > > >> > > >> (1) The SQL client is designed to offer a convenient way for users to > > >> manipulate data with Flink. Obviously, the second option would be more > > >> easy-to-use. > > >> > > >> (2) The script will help to manage the dependencies automatically, but > > >> with less flexibility. Once the script cannot meet the need, users > have > > to > > >> modify it themselves. > > >> > > >> (3) I wonder whether we could package all these built-in connectors > and > > >> formats into a single JAR. With this all-in-one solution, users don’t > > need > > >> to consider much about the dependencies. > > >> > > >> Best, > > >> Xingcan > > >> > > >> On 27 Feb 2018, at 6:38 PM, Stephan Ewen <se...@apache.org> wrote: > > >>> > > >>> My first intuition would be to go for approach #2 for the following > > >>> reasons > > >>> > > >>> - I expect that in the long run, the scripts will not be that simple > to > > >>> maintain. We saw that with all shell scripts thus far: they start > > simple, > > >>> and then grow with many special cases for this and that setup. > > >>> > > >>> - Not all users have Maven, automatically downloading and configuring > > >>> Maven could be an option, but that makes the scripts yet more tricky. > > >>> > > >>> - Download-and-drop-in is probably still easier to understand for > users > > >>> than the syntax of a script with its parameters > > >>> > > >>> - I think it may actually be even simpler to maintain for us, because > > all > > >>> it does is add a profile or build target to each connector to also > > create > > >>> the fat jar. > > >>> > > >>> - Storage space is no longer really a problem. Worst case we host the > > fat > > >>> jars in an S3 bucket. > > >>> > > >>> > > >>> On Mon, Feb 26, 2018 at 7:33 PM, Timo Walther <twal...@apache.org> > > >>> wrote: > > >>> > > >>> Hi everyone, > > >>>> > > >>>> as you may know a first minimum version of FLIP-24 [1] for the > > upcoming > > >>>> Flink SQL Client has been merged to the master. We also merged > > >>>> possibilities to discover and configure table sources without a > single > > >>>> line > > >>>> of code using string-based properties [2] and Java service provider > > >>>> discovery. > > >>>> > > >>>> We are now facing the issue of how to manage dependencies in this > new > > >>>> environment. It is different from how regular Flink projects are > > created > > >>>> (by setting up a a new Maven project and build a jar or fat jar). > > >>>> Ideally, > > >>>> a user should be able to select from a set of prepared connectors, > > >>>> catalogs, and formats. E.g., if a Kafka connector and Avro format is > > >>>> needed, all that should be required is to move a "flink-kafka.jar" > and > > >>>> "flink-avro.jar" into the "sql_lib" directory that is shipped to a > > Flink > > >>>> cluster together with the SQL query. > > >>>> > > >>>> The question is how do we want to offer those JAR files in the > future? > > >>>> We > > >>>> see two options: > > >>>> > > >>>> 1) We prepare Maven build profiles for all offered modules and > > provide a > > >>>> shell script for building fat jars. A script call could look like > > >>>> "./sql-client-dependency.sh kafka 0.10". It would automatically > > download > > >>>> what is needed and place the JAR file in the library folder. This > > >>>> approach > > >>>> would keep our development effort low but would require Maven to be > > >>>> present > > >>>> and builds to pass on different environments (e.g. Windows). > > >>>> > > >>>> 2) We build fat jars for these modules with every Flink release that > > can > > >>>> be hostet somewhere (e.g. Apache infrastructure, but not Maven > > central). > > >>>> This would make it very easy to add a dependency by downloading the > > >>>> prepared JAR files. However, it would require to build and host > large > > >>>> fat > > >>>> jars for every connector (and version) with every Flink major and > > minor > > >>>> release. The size of such a repository might grow quickly. > > >>>> > > >>>> What do you think? Do you see other options to make adding > > dependencies > > >>>> as > > >>>> possible? > > >>>> > > >>>> > > >>>> Regards, > > >>>> > > >>>> Timo > > >>>> > > >>>> > > >>>> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+ > > >>>> SQL+Client > > >>>> > > >>>> [2] https://issues.apache.org/jira/browse/FLINK-8240 > > >>>> > > >>>> > > >>>> > > > > > >