Re: [DISCUS] Flink SQL Client dependency management

Fabian Hueske Thu, 01 Mar 2018 02:39:35 -0800

I agree, option (2) would be the easiest approach for the users.


2018-03-01 0:00 GMT+01:00 Rong Rong <[email protected]>:

> Hi Timo,
>
> Thanks for the initiating the SQL client effort. I agree with Xingcan's
> points, adding to it (1) most of the user for SQL client would very likely
> to have little Maven / build tool knowledge and (2) most likely the build
> script would grow much complex in the future that makes it exponentially
> hard for user to modify themselves.
>
> On (3) the single "fat" jar idea, adding on to the dependency conflict
> issue, another very common way I see is that users often want to maintain a
> list of individual jars, such as a list of relatively-constant, handy UDFs
> every time using the SQL client. They will probably need to package and
> ship separately anyway. I was wondering if "download-and-drop-in" might be
> a more straight forward approach?
>
> Best,
> Rong
>
> On Tue, Feb 27, 2018 at 8:23 AM, Stephan Ewen <[email protected]> wrote:
>
> > I think one problem with the "one fat jar for all" is that some
> > dependencies clash in the classnames across versions:
> >   - Kafka 0.9, 0.10, 0.11, 1.0
> >   - Elasticsearch 2, 4, and 5
> >
> > There are probably others as well...
> >
> > On Tue, Feb 27, 2018 at 2:57 PM, Timo Walther <[email protected]>
> wrote:
> >
> > > Hi Xingcan,
> > >
> > > thank you for your feedback. Regarding (3) we also thought about that
> but
> > > this approach would not scale very well. Given that we might have fat
> > jars
> > > for multiple versions (Kafka 0.8, Kafka 0.6 etc.) such an all-in-one
> > > solution JAR file might easily go beyond 1 or 2 GB. I don't know if
> users
> > > want to download that just for a combination of connector and format.
> > >
> > > Timo
> > >
> > >
> > > Am 2/27/18 um 2:16 PM schrieb Xingcan Cui:
> > >
> > > Hi Timo,
> > >>
> > >> thanks for your efforts. Personally, I think the second option would
> be
> > >> better and here are my feelings.
> > >>
> > >> (1) The SQL client is designed to offer a convenient way for users to
> > >> manipulate data with Flink. Obviously, the second option would be more
> > >> easy-to-use.
> > >>
> > >> (2) The script will help to manage the dependencies automatically, but
> > >> with less flexibility. Once the script cannot meet the need, users
> have
> > to
> > >> modify it themselves.
> > >>
> > >> (3) I wonder whether we could package all these built-in connectors
> and
> > >> formats into a single JAR. With this all-in-one solution, users don’t
> > need
> > >> to consider much about the dependencies.
> > >>
> > >> Best,
> > >> Xingcan
> > >>
> > >> On 27 Feb 2018, at 6:38 PM, Stephan Ewen <[email protected]> wrote:
> > >>>
> > >>> My first intuition would be to go for approach #2 for the following
> > >>> reasons
> > >>>
> > >>> - I expect that in the long run, the scripts will not be that simple
> to
> > >>> maintain. We saw that with all shell scripts thus far: they start
> > simple,
> > >>> and then grow with many special cases for this and that setup.
> > >>>
> > >>> - Not all users have Maven, automatically downloading and configuring
> > >>> Maven could be an option, but that makes the scripts yet more tricky.
> > >>>
> > >>> - Download-and-drop-in is probably still easier to understand for
> users
> > >>> than the syntax of a script with its parameters
> > >>>
> > >>> - I think it may actually be even simpler to maintain for us, because
> > all
> > >>> it does is add a profile or build target to each connector to also
> > create
> > >>> the fat jar.
> > >>>
> > >>> - Storage space is no longer really a problem. Worst case we host the
> > fat
> > >>> jars in an S3 bucket.
> > >>>
> > >>>
> > >>> On Mon, Feb 26, 2018 at 7:33 PM, Timo Walther <[email protected]>
> > >>> wrote:
> > >>>
> > >>> Hi everyone,
> > >>>>
> > >>>> as you may know a first minimum version of FLIP-24 [1] for the
> > upcoming
> > >>>> Flink SQL Client has been merged to the master. We also merged
> > >>>> possibilities to discover and configure table sources without a
> single
> > >>>> line
> > >>>> of code using string-based properties [2] and Java service provider
> > >>>> discovery.
> > >>>>
> > >>>> We are now facing the issue of how to manage dependencies in this
> new
> > >>>> environment. It is different from how regular Flink projects are
> > created
> > >>>> (by setting up a a new Maven project and build a jar or fat jar).
> > >>>> Ideally,
> > >>>> a user should be able to select from a set of prepared connectors,
> > >>>> catalogs, and formats. E.g., if a Kafka connector and Avro format is
> > >>>> needed, all that should be required is to move a "flink-kafka.jar"
> and
> > >>>> "flink-avro.jar" into the "sql_lib" directory that is shipped to a
> > Flink
> > >>>> cluster together with the SQL query.
> > >>>>
> > >>>> The question is how do we want to offer those JAR files in the
> future?
> > >>>> We
> > >>>> see two options:
> > >>>>
> > >>>> 1) We prepare Maven build profiles for all offered modules and
> > provide a
> > >>>> shell script for building fat jars. A script call could look like
> > >>>> "./sql-client-dependency.sh kafka 0.10". It would automatically
> > download
> > >>>> what is needed and place the JAR file in the library folder. This
> > >>>> approach
> > >>>> would keep our development effort low but would require Maven to be
> > >>>> present
> > >>>> and builds to pass on different environments (e.g. Windows).
> > >>>>
> > >>>> 2) We build fat jars for these modules with every Flink release that
> > can
> > >>>> be hostet somewhere (e.g. Apache infrastructure, but not Maven
> > central).
> > >>>> This would make it very easy to add a dependency by downloading the
> > >>>> prepared JAR files. However, it would require to build and host
> large
> > >>>> fat
> > >>>> jars for every connector (and version) with every Flink major and
> > minor
> > >>>> release. The size of such a repository might grow quickly.
> > >>>>
> > >>>> What do you think? Do you see other options to make adding
> > dependencies
> > >>>> as
> > >>>> possible?
> > >>>>
> > >>>>
> > >>>> Regards,
> > >>>>
> > >>>> Timo
> > >>>>
> > >>>>
> > >>>> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+
> > >>>> SQL+Client
> > >>>>
> > >>>> [2] https://issues.apache.org/jira/browse/FLINK-8240
> > >>>>
> > >>>>
> > >>>>
> > >
> >
>

Re: [DISCUS] Flink SQL Client dependency management

Reply via email to