+1 for "slim" and "fat" solution. One comment about the fat one, I think we need to put all needed jars into /lib (or /plugins). Put jars into /opt and relying on users moving them from /opt to /lib doesn't really improve the out-of-box experience.
Best, Kurt On Fri, Apr 24, 2020 at 8:28 PM Aljoscha Krettek <aljos...@apache.org> wrote: > re (1): I don't know about that, probably the people that did the > metrics reporter plugin support had some thoughts about that. > > re (2): I agree, that's why I initially suggested to split it into > "slim" and "fat" because our current "medium fat" selection of jars in > Flink dist does not serve anyone too well. It's too fat for people that > want to build lean application images. It's to lean for people that want > a good first out-of-box experience. > > Aljoscha > > On 17.04.20 16:38, Stephan Ewen wrote: > > @Aljoscha I think that is an interesting line of thinking. the swift-fs > may > > be rarely enough used to move it to an optional download. > > > > I would still drop two more thoughts: > > > > (1) Now that we have plugins support, is there a reason to have a metrics > > reporter or file system in /opt instead of /plugins? They don't spoil the > > class path any more. > > > > (2) I can imagine there still being a desire to have a "minimal" docker > > file, for users that want to keep the container images as small as > > possible, to speed up deployment. It is fine if that would not be the > > default, though. > > > > > > On Fri, Apr 17, 2020 at 12:16 PM Aljoscha Krettek <aljos...@apache.org> > > wrote: > > > >> I think having such tools and/or tailor-made distributions can be nice > >> but I also think the discussion is missing the main point: The initial > >> observation/motivation is that apparently a lot of users (Kurt and I > >> talked about this) on the chinese DingTalk support groups, and other > >> support channels have problems when first using the SQL client because > >> of these missing connectors/formats. For these, having additional tools > >> would not solve anything because they would also not take that extra > >> step. I think that even tiny friction should be avoided because the > >> annoyance from it accumulates of the (hopefully) many users that we want > >> to have. > >> > >> Maybe we should take a step back from discussing the "fat"/"slim" idea > >> and instead think about the composition of the current dist. As > >> mentioned we have these jars in opt/: > >> > >> 17M flink-azure-fs-hadoop-1.10.0.jar > >> 52K flink-cep-scala_2.11-1.10.0.jar > >> 180K flink-cep_2.11-1.10.0.jar > >> 746K flink-gelly-scala_2.11-1.10.0.jar > >> 626K flink-gelly_2.11-1.10.0.jar > >> 512K flink-metrics-datadog-1.10.0.jar > >> 159K flink-metrics-graphite-1.10.0.jar > >> 1.0M flink-metrics-influxdb-1.10.0.jar > >> 102K flink-metrics-prometheus-1.10.0.jar > >> 10K flink-metrics-slf4j-1.10.0.jar > >> 12K flink-metrics-statsd-1.10.0.jar > >> 36M flink-oss-fs-hadoop-1.10.0.jar > >> 28M flink-python_2.11-1.10.0.jar > >> 22K flink-queryable-state-runtime_2.11-1.10.0.jar > >> 18M flink-s3-fs-hadoop-1.10.0.jar > >> 31M flink-s3-fs-presto-1.10.0.jar > >> 196K flink-shaded-netty-tcnative-dynamic-2.0.25.Final-9.0.jar > >> 518K flink-sql-client_2.11-1.10.0.jar > >> 99K flink-state-processor-api_2.11-1.10.0.jar > >> 25M flink-swift-fs-hadoop-1.10.0.jar > >> 160M opt > >> > >> The "filesystem" connectors ar ethe heavy hitters, there. > >> > >> I downloaded most of the SQL connectors/formats and this is what I got: > >> > >> 73K flink-avro-1.10.0.jar > >> 36K flink-csv-1.10.0.jar > >> 55K flink-hbase_2.11-1.10.0.jar > >> 88K flink-jdbc_2.11-1.10.0.jar > >> 42K flink-json-1.10.0.jar > >> 20M flink-sql-connector-elasticsearch6_2.11-1.10.0.jar > >> 2.8M flink-sql-connector-kafka_2.11-1.10.0.jar > >> 24M sql-connectors-formats > >> > >> We could just add these to the Flink distribution without blowing it up > >> by much. We could drop any of the existing "filesystem" connectors from > >> opt and add the SQL connectors/formats and not change the size of Flink > >> dist. So maybe we should do that instead? > >> > >> We would need some tooling for the sql-client shell script to pick-up > >> the connectors/formats up from opt/ because we don't want to add them to > >> lib/. We're already doing that for finding the flink-sql-client jar, > >> which is also not in lib/. > >> > >> What do you think? > >> > >> Best, > >> Aljoscha > >> > >> On 17.04.20 05:22, Jark Wu wrote: > >>> Hi, > >>> > >>> I like the idea of web tool to assemble fat distribution. And the > >>> https://code.quarkus.io/ looks very nice. > >>> All the users need to do is just select what he/she need (I think this > >> step > >>> can't be omitted anyway). > >>> We can also provide a default fat distribution on the web which default > >>> selects some popular connectors. > >>> > >>> Best, > >>> Jark > >>> > >>> On Fri, 17 Apr 2020 at 02:29, Rafi Aroch <rafi.ar...@gmail.com> wrote: > >>> > >>>> As a reference for a nice first-experience I had, take a look at > >>>> https://code.quarkus.io/ > >>>> You reach this page after you click "Start Coding" at the project > >> homepage. > >>>> > >>>> Rafi > >>>> > >>>> > >>>> On Thu, Apr 16, 2020 at 6:53 PM Kurt Young <ykt...@gmail.com> wrote: > >>>> > >>>>> I'm not saying pre-bundle some jars will make this problem go away, > and > >>>>> you're right that only hides the problem for > >>>>> some users. But what if this solution can hide the problem for 90% > >> users? > >>>>> Would't that be good enough for us to try? > >>>>> > >>>>> Regarding to would users following instructions really be such a big > >>>>> problem? > >>>>> I'm afraid yes. Otherwise I won't answer such questions for at least > a > >>>>> dozen times and I won't see such questions coming > >>>>> up from time to time. During some periods, I even saw such questions > >>>> every > >>>>> day. > >>>>> > >>>>> Best, > >>>>> Kurt > >>>>> > >>>>> > >>>>> On Thu, Apr 16, 2020 at 11:21 PM Chesnay Schepler < > ches...@apache.org> > >>>>> wrote: > >>>>> > >>>>>> The problem with having a distribution with "popular" stuff is that > it > >>>>>> doesn't really *solve* a problem, it just hides it for users who > fall > >>>>>> into these particular use-cases. > >>>>>> Move out of it and you once again run into exact same problems > >>>> out-lined. > >>>>>> > >>>>>> This is exactly why I like the tooling approach; you have to deal > with > >>>> it > >>>>>> from the start and transitioning to a custom use-case is easier. > >>>>>> > >>>>>> Would users following instructions really be such a big problem? > >>>>>> I would expect that users generally know *what *they need, just not > >>>>>> necessarily how it is assembled correctly (where do get which jar, > >>>> which > >>>>>> directory to put it in). > >>>>>> It seems like these are exactly the problem this would solve? > >>>>>> I just don't see how moving a jar corresponding to some feature from > >>>> opt > >>>>>> to some directory (lib/plugins) is less error-prone than just > >> selecting > >>>>> the > >>>>>> feature and having the tool handle the rest. > >>>>>> > >>>>>> As for re-distributions, it depends on the form that the tool would > >>>> take. > >>>>>> It could be an application that runs locally and works against maven > >>>>>> central (note: not necessarily *using* maven); this should would > work > >>>> in > >>>>>> China, no? > >>>>>> > >>>>>> A web tool would of course be fancy, but I don't know how feasible > >> this > >>>>> is > >>>>>> with the ASF infrastructure. > >>>>>> You wouldn't be able to mirror the distribution, so the load can't > be > >>>>>> distributed. I doubt INFRA would like this. > >>>>>> > >>>>>> Note that third-parties could also start distributing use-case > >> oriented > >>>>>> distributions, which would be perfectly fine as far as I'm > concerned. > >>>>>> > >>>>>> On 16/04/2020 16:57, Kurt Young wrote: > >>>>>> > >>>>>> I'm not so sure about the web tool solution though. The concern I > have > >>>>> for > >>>>>> this approach is the final generated > >>>>>> distribution is kind of non-deterministic. We might generate too > many > >>>>>> different combinations when user trying to > >>>>>> package different types of connector, format, and even maybe hadoop > >>>>>> releases. As far as I can tell, most open > >>>>>> source projects and apache projects will only release some > >>>>>> pre-defined distributions, which most users are already > >>>>>> familiar with, thus hard to change IMO. And I also have went through > >> in > >>>>>> some cases, users will try to re-distribute > >>>>>> the release package, because of the unstable network of apache > website > >>>>> from > >>>>>> China. In web tool solution, I don't > >>>>>> think this kind of re-distribution would be possible anymore. > >>>>>> > >>>>>> In the meantime, I also have a concern that we will fall back into > our > >>>>> trap > >>>>>> again if we try to offer this smart & flexible > >>>>>> solution. Because it needs users to cooperate with such mechanism. > >> It's > >>>>>> exactly the situation what we currently fell > >>>>>> into: > >>>>>> 1. We offered a smart solution. > >>>>>> 2. We hope users will follow the correct instructions. > >>>>>> 3. Everything will work as expected if users followed the right > >>>>>> instructions. > >>>>>> > >>>>>> In reality, I suspect not all users will do the second step > correctly. > >>>>> And > >>>>>> for new users who only trying to have a quick > >>>>>> experience with Flink, I would bet most users will do it wrong. > >>>>>> > >>>>>> So, my proposal would be one of the following 2 options: > >>>>>> 1. Provide a slim distribution for advanced product users and > provide > >> a > >>>>>> distribution which will have some popular builtin jars. > >>>>>> 2. Only provide a distribution which will have some popular builtin > >>>> jars. > >>>>>> > >>>>>> If we are trying to reduce the distributions we released, I would > >>>> prefer > >>>>> 2 > >>>>>> > >>>>>> 1. > >>>>>> > >>>>>> Best, > >>>>>> Kurt > >>>>>> > >>>>>> > >>>>>> On Thu, Apr 16, 2020 at 9:33 PM Till Rohrmann <trohrm...@apache.org > > > >> < > >>>>> trohrm...@apache.org> wrote: > >>>>>> > >>>>>> > >>>>>> I think what Chesnay and Dawid proposed would be the ideal solution. > >>>>>> Ideally, we would also have a nice web tool for the website which > >>>>> generates > >>>>>> the corresponding distribution for download. > >>>>>> > >>>>>> To get things started we could start with only supporting to > >>>>>> download/creating the "fat" version with the script. The fat version > >>>>> would > >>>>>> then consist of the slim distribution and whatever we deem important > >>>> for > >>>>>> new users to get started. > >>>>>> > >>>>>> Cheers, > >>>>>> Till > >>>>>> > >>>>>> On Thu, Apr 16, 2020 at 11:33 AM Dawid Wysakowicz < > >>>>> dwysakow...@apache.org> <dwysakow...@apache.org> > >>>>>> wrote: > >>>>>> > >>>>>> > >>>>>> Hi all, > >>>>>> > >>>>>> Few points from my side: > >>>>>> > >>>>>> 1. I like the idea of simplifying the experience for first time > users. > >>>>>> As for production use cases I share Jark's opinion that in this > case I > >>>>>> would expect users to combine their distribution manually. I think > in > >>>>>> such scenarios it is important to understand interconnections. > >>>>>> Personally I'd expect the slimmest possible distribution that I can > >>>>>> extend further with what I need in my production scenario. > >>>>>> > >>>>>> 2. I think there is also the problem that the matrix of possible > >>>>>> combinations that can be useful is already big. Do we want to have a > >>>>>> distribution for: > >>>>>> > >>>>>> SQL users: which connectors should we include? should we > include > >>>>>> hive? which other catalog? > >>>>>> > >>>>>> DataStream users: which connectors should we include? > >>>>>> > >>>>>> For both of the above should we include yarn/kubernetes? > >>>>>> > >>>>>> I would opt for providing only the "slim" distribution as a release > >>>>>> artifact. > >>>>>> > >>>>>> 3. However, as I said I think its worth investigating how we can > >>>> improve > >>>>>> users experience. What do you think of providing a tool, could be > e.g. > >>>> a > >>>>>> shell script that constructs a distribution based on users choice. I > >>>>>> think that was also what Chesnay mentioned as "tooling to > >>>>>> assemble custom distributions" In the end how I see the difference > >>>>>> between a slim and fat distribution is which jars do we put into the > >>>>>> lib, right? It could have a few "screens". > >>>>>> > >>>>>> 1. Which API are you interested in: > >>>>>> a. SQL API > >>>>>> b. DataStream API > >>>>>> > >>>>>> > >>>>>> 2. [SQL] Which connectors do you want to use? [multichoice]: > >>>>>> a. Kafka > >>>>>> b. Elasticsearch > >>>>>> ... > >>>>>> > >>>>>> 3. [SQL] Which catalog you want to use? > >>>>>> > >>>>>> ... > >>>>>> > >>>>>> Such a tool would download all the dependencies from maven and put > >> them > >>>>>> into the correct folder. In the future we can extend it with > >> additional > >>>>>> rules e.g. kafka-0.9 cannot be chosen at the same time with > >>>>>> kafka-universal etc. > >>>>>> > >>>>>> The benefit of it would be that the distribution that we release > could > >>>>>> remain "slim" or we could even make it slimmer. I might be missing > >>>>>> something here though. > >>>>>> > >>>>>> Best, > >>>>>> > >>>>>> Dawdi > >>>>>> > >>>>>> On 16/04/2020 11:02, Aljoscha Krettek wrote: > >>>>>> > >>>>>> I want to reinforce my opinion from earlier: This is about improving > >>>>>> the situation both for first-time users and for experienced users > that > >>>>>> want to use a Flink dist in production. The current Flink dist is > too > >>>>>> "thin" for first-time SQL users and it is too "fat" for production > >>>>>> users, that is where serving no-one properly with the current > >>>>>> middle-ground. That's why I think introducing those specialized > >>>>>> "spins" of Flink dist would be good. > >>>>>> > >>>>>> By the way, at some point in the future production users might not > >>>>>> even need to get a Flink dist anymore. They should be able to have > >>>>>> Flink as a dependency of their project (including the runtime) and > >>>>>> then build an image from this for Kubernetes or a fat jar for YARN. > >>>>>> > >>>>>> Aljoscha > >>>>>> > >>>>>> On 15.04.20 18:14, wenlong.lwl wrote: > >>>>>> > >>>>>> Hi all, > >>>>>> > >>>>>> Regarding slim and fat distributions, I think different kinds of > jobs > >>>>>> may > >>>>>> prefer different type of distribution: > >>>>>> > >>>>>> For DataStream job, I think we may not like fat distribution > >>>>>> > >>>>>> containing > >>>>>> > >>>>>> connectors because user would always need to depend on the connector > >>>>>> > >>>>>> in > >>>>>> > >>>>>> user code, it is easy to include the connector jar in the user lib. > >>>>>> > >>>>>> Less > >>>>>> > >>>>>> jar in lib means less class conflicts and problems. > >>>>>> > >>>>>> For SQL job, I think we are trying to encourage user to user pure > >>>>>> sql(DDL + > >>>>>> DML) to construct their job, In order to improve user experience, It > >>>>>> may be > >>>>>> important for flink, not only providing as many connector jar in > >>>>>> distribution as possible especially the connector and format we have > >>>>>> well > >>>>>> documented, but also providing an mechanism to load connectors > >>>>>> according > >>>>>> to the DDLs, > >>>>>> > >>>>>> So I think it could be good to place connector/format jars in some > >>>>>> dir like > >>>>>> opt/connector which would not affect jobs by default, and introduce > a > >>>>>> mechanism of dynamic discovery for SQL. > >>>>>> > >>>>>> Best, > >>>>>> Wenlong > >>>>>> > >>>>>> On Wed, 15 Apr 2020 at 22:46, Jingsong Li <jingsongl...@gmail.com> > < > >>>>> jingsongl...@gmail.com> > >>>>>> wrote: > >>>>>> > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> I am thinking both "improve first experience" and "improve > production > >>>>>> experience". > >>>>>> > >>>>>> I'm thinking about what's the common mode of Flink? > >>>>>> Streaming job use Kafka? Batch job use Hive? > >>>>>> > >>>>>> Hive 1.2.1 dependencies can be compatible with most of Hive server > >>>>>> versions. So Spark and Presto have built-in Hive 1.2.1 dependency. > >>>>>> Flink is currently mainly used for streaming, so let's not talk > >>>>>> about hive. > >>>>>> > >>>>>> For streaming jobs, first of all, the jobs in my mind is (related to > >>>>>> connectors): > >>>>>> - ETL jobs: Kafka -> Kafka > >>>>>> - Join jobs: Kafka -> DimJDBC -> Kafka > >>>>>> - Aggregation jobs: Kafka -> JDBCSink > >>>>>> So Kafka and JDBC are probably the most commonly used. Of course, > >>>>>> > >>>>>> also > >>>>>> > >>>>>> includes CSV, JSON's formats. > >>>>>> So when we provide such a fat distribution: > >>>>>> - With CSV, JSON. > >>>>>> - With flink-kafka-universal and kafka dependencies. > >>>>>> - With flink-jdbc. > >>>>>> Using this fat distribution, most users can run their jobs well. > >>>>>> > >>>>>> (jdbc > >>>>>> > >>>>>> driver jar required, but this is very natural to do) > >>>>>> Can these dependencies lead to kinds of conflicts? Only Kafka may > >>>>>> > >>>>>> have > >>>>>> > >>>>>> conflicts, but if our goal is to use kafka-universal to support all > >>>>>> Kafka > >>>>>> versions, it is hopeful to target the vast majority of users. > >>>>>> > >>>>>> We don't want to plug all jars into the fat distribution. Only need > >>>>>> less > >>>>>> conflict and common. of course, it is a matter of consideration to > >>>>>> > >>>>>> put > >>>>>> > >>>>>> which jar into fat distribution. > >>>>>> We have the opportunity to facilitate the majority of users, but > >>>>>> also left > >>>>>> opportunities for customization. > >>>>>> > >>>>>> Best, > >>>>>> Jingsong Lee > >>>>>> > >>>>>> On Wed, Apr 15, 2020 at 10:09 PM Jark Wu <imj...@gmail.com> < > >>>>> imj...@gmail.com> wrote: > >>>>>> > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> I think we should first reach an consensus on "what problem do we > >>>>>> want to > >>>>>> solve?" > >>>>>> (1) improve first experience? or (2) improve production experience? > >>>>>> > >>>>>> As far as I can see, with the above discussion, I think what we > >>>>>> want to > >>>>>> solve is the "first experience". > >>>>>> And I think the slim jar is still the best distribution for > >>>>>> production, > >>>>>> because it's easier to assembling jars > >>>>>> than excluding jars and can avoid potential class conflicts. > >>>>>> > >>>>>> If we want to improve "first experience", I think it make sense to > >>>>>> have a > >>>>>> fat distribution to give users a more smooth first experience. > >>>>>> But I would like to call it "playground distribution" or something > >>>>>> like > >>>>>> that to explicitly differ from the "slim production-purpose > >>>>>> > >>>>>> distribution". > >>>>>> > >>>>>> The "playground distribution" can contains some widely used jars, > >>>>>> > >>>>>> like > >>>>>> > >>>>>> universal-kafka-sql-connector, elasticsearch7-sql-connector, avro, > >>>>>> json, > >>>>>> csv, etc.. > >>>>>> Even we can provide a playground docker which may contain the fat > >>>>>> distribution, python3, and hive. > >>>>>> > >>>>>> Best, > >>>>>> Jark > >>>>>> > >>>>>> > >>>>>> On Wed, 15 Apr 2020 at 21:47, Chesnay Schepler <ches...@apache.org> > < > >>>>> ches...@apache.org> > >>>>>> > >>>>>> wrote: > >>>>>> > >>>>>> I don't see a lot of value in having multiple distributions. > >>>>>> > >>>>>> The simple reality is that no fat distribution we could provide > >>>>>> > >>>>>> would > >>>>>> > >>>>>> satisfy all use-cases, so why even try. > >>>>>> If users commonly run into issues for certain jars, then maybe > >>>>>> > >>>>>> those > >>>>>> > >>>>>> should be added to the current distribution. > >>>>>> > >>>>>> Personally though I still believe we should only distribute a slim > >>>>>> version. I'd rather have users always add required jars to the > >>>>>> distribution than only when they go outside our "expected" > >>>>>> > >>>>>> use-cases. > >>>>>> > >>>>>> Then we might finally address this issue properly, i.e., tooling to > >>>>>> assemble custom distributions and/or better error messages if > >>>>>> Flink-provided extensions cannot be found. > >>>>>> > >>>>>> On 15/04/2020 15:23, Kurt Young wrote: > >>>>>> > >>>>>> Regarding to the specific solution, I'm not sure about the "fat" > >>>>>> > >>>>>> and > >>>>>> > >>>>>> "slim" > >>>>>> > >>>>>> solution though. I get the idea > >>>>>> that we can make the slim one even more lightweight than current > >>>>>> distribution, but what about the "fat" > >>>>>> one? Do you mean that we would package all connectors and formats > >>>>>> > >>>>>> into > >>>>>> > >>>>>> this? I'm not sure if this is > >>>>>> feasible. For example, we can't put all versions of kafka and hive > >>>>>> connector jars into lib directory, and > >>>>>> we also might need hadoop jars when using filesystem connector to > >>>>>> > >>>>>> access > >>>>>> > >>>>>> data from HDFS. > >>>>>> > >>>>>> So my guess would be we might hand-pick some of the most > >>>>>> > >>>>>> frequently > >>>>>> > >>>>>> used > >>>>>> > >>>>>> connectors and formats > >>>>>> into our "lib" directory, like kafka, csv, json metioned above, > >>>>>> > >>>>>> and > >>>>>> > >>>>>> still > >>>>>> > >>>>>> leave some other connectors out of it. > >>>>>> If this is the case, then why not we just provide this > >>>>>> > >>>>>> distribution > >>>>>> > >>>>>> to > >>>>>> > >>>>>> user? I'm not sure i get the benefit of > >>>>>> providing another super "slim" jar (we have to pay some costs to > >>>>>> > >>>>>> provide > >>>>>> > >>>>>> another suit of distribution). > >>>>>> > >>>>>> What do you think? > >>>>>> > >>>>>> Best, > >>>>>> Kurt > >>>>>> > >>>>>> > >>>>>> On Wed, Apr 15, 2020 at 7:08 PM Jingsong Li < > >>>>>> > >>>>>> jingsongl...@gmail.com > >>>>>> > >>>>>> wrote: > >>>>>> > >>>>>> Big +1. > >>>>>> > >>>>>> I like "fat" and "slim". > >>>>>> > >>>>>> For csv and json, like Jark said, they are quite small and don't > >>>>>> > >>>>>> have > >>>>>> > >>>>>> other > >>>>>> > >>>>>> dependencies. They are important to kafka connector, and > >>>>>> > >>>>>> important > >>>>>> > >>>>>> to upcoming file system connector too. > >>>>>> So can we move them to both "fat" and "slim"? They're so > >>>>>> > >>>>>> important, > >>>>>> > >>>>>> and > >>>>>> > >>>>>> they're so lightweight. > >>>>>> > >>>>>> Best, > >>>>>> Jingsong Lee > >>>>>> > >>>>>> On Wed, Apr 15, 2020 at 4:53 PM godfrey he <godfre...@gmail.com> < > >>>>> godfre...@gmail.com> > >>>>>> > >>>>>> wrote: > >>>>>> > >>>>>> Big +1. > >>>>>> This will improve user experience (special for Flink new users). > >>>>>> We answered so many questions about "class not found". > >>>>>> > >>>>>> Best, > >>>>>> Godfrey > >>>>>> > >>>>>> Dian Fu <dian0511...@gmail.com> <dian0511...@gmail.com> > 于2020年4月15日周三 > >>>>> 下午4:30写道: > >>>>>> > >>>>>> > >>>>>> +1 to this proposal. > >>>>>> > >>>>>> Missing connector jars is also a big problem for PyFlink users. > >>>>>> > >>>>>> Currently, > >>>>>> > >>>>>> after a Python user has installed PyFlink using `pip`, he has > >>>>>> > >>>>>> to > >>>>>> > >>>>>> manually > >>>>>> > >>>>>> copy the connector fat jars to the PyFlink installation > >>>>>> > >>>>>> directory > >>>>>> > >>>>>> for > >>>>>> > >>>>>> the > >>>>>> > >>>>>> connectors to be used if he wants to run jobs locally. This > >>>>>> > >>>>>> process > >>>>>> > >>>>>> is > >>>>>> > >>>>>> very > >>>>>> > >>>>>> confuse for users and affects the experience a lot. > >>>>>> > >>>>>> Regards, > >>>>>> Dian > >>>>>> > >>>>>> > >>>>>> 在 2020年4月15日,下午3:51,Jark Wu <imj...@gmail.com> <imj...@gmail.com> > 写道: > >>>>>> > >>>>>> +1 to the proposal. I also found the "download additional jar" > >>>>>> > >>>>>> step > >>>>>> > >>>>>> is > >>>>>> > >>>>>> really verbose when I prepare webinars. > >>>>>> > >>>>>> At least, I think the flink-csv and flink-json should in the > >>>>>> > >>>>>> distribution, > >>>>>> > >>>>>> they are quite small and don't have other dependencies. > >>>>>> > >>>>>> Best, > >>>>>> Jark > >>>>>> > >>>>>> On Wed, 15 Apr 2020 at 15:44, Jeff Zhang <zjf...@gmail.com> < > >>>>> zjf...@gmail.com> > >>>>>> > >>>>>> wrote: > >>>>>> > >>>>>> Hi Aljoscha, > >>>>>> > >>>>>> Big +1 for the fat flink distribution, where do you plan to > >>>>>> > >>>>>> put > >>>>>> > >>>>>> these > >>>>>> > >>>>>> connectors ? opt or lib ? > >>>>>> > >>>>>> Aljoscha Krettek <aljos...@apache.org> <aljos...@apache.org> > >>>>> 于2020年4月15日周三 > >>>>>> 下午3:30写道: > >>>>>> > >>>>>> > >>>>>> Hi Everyone, > >>>>>> > >>>>>> I'd like to discuss about releasing a more full-featured > >>>>>> > >>>>>> Flink > >>>>>> > >>>>>> distribution. The motivation is that there is friction for > >>>>>> > >>>>>> SQL/Table > >>>>>> > >>>>>> API > >>>>>> > >>>>>> users that want to use Table connectors which are not there > >>>>>> > >>>>>> in > >>>>>> > >>>>>> the > >>>>>> > >>>>>> current Flink Distribution. For these users the workflow is > >>>>>> > >>>>>> currently > >>>>>> > >>>>>> roughly: > >>>>>> > >>>>>> - download Flink dist > >>>>>> - configure csv/Kafka/json connectors per configuration > >>>>>> - run SQL client or program > >>>>>> - decrypt error message and research the solution > >>>>>> - download additional connector jars > >>>>>> - program works correctly > >>>>>> > >>>>>> I realize that this can be made to work but if every SQL > >>>>>> > >>>>>> user > >>>>>> > >>>>>> has > >>>>>> > >>>>>> this > >>>>>> > >>>>>> as their first experience that doesn't seem good to me. > >>>>>> > >>>>>> My proposal is to provide two versions of the Flink > >>>>>> > >>>>>> Distribution > >>>>>> > >>>>>> in > >>>>>> > >>>>>> the > >>>>>> > >>>>>> future: "fat" and "slim" (names to be discussed): > >>>>>> > >>>>>> - slim would be even trimmer than todays distribution > >>>>>> - fat would contain a lot of convenience connectors (yet > >>>>>> > >>>>>> to > >>>>>> > >>>>>> be > >>>>>> > >>>>>> determined which one) > >>>>>> > >>>>>> And yes, I realize that there are already more dimensions of > >>>>>> > >>>>>> Flink > >>>>>> > >>>>>> releases (Scala version and Java version). > >>>>>> > >>>>>> For background, our current Flink dist has these in the opt > >>>>>> > >>>>>> directory: > >>>>>> > >>>>>> - flink-azure-fs-hadoop-1.10.0.jar > >>>>>> - flink-cep-scala_2.12-1.10.0.jar > >>>>>> - flink-cep_2.12-1.10.0.jar > >>>>>> - flink-gelly-scala_2.12-1.10.0.jar > >>>>>> - flink-gelly_2.12-1.10.0.jar > >>>>>> - flink-metrics-datadog-1.10.0.jar > >>>>>> - flink-metrics-graphite-1.10.0.jar > >>>>>> - flink-metrics-influxdb-1.10.0.jar > >>>>>> - flink-metrics-prometheus-1.10.0.jar > >>>>>> - flink-metrics-slf4j-1.10.0.jar > >>>>>> - flink-metrics-statsd-1.10.0.jar > >>>>>> - flink-oss-fs-hadoop-1.10.0.jar > >>>>>> - flink-python_2.12-1.10.0.jar > >>>>>> - flink-queryable-state-runtime_2.12-1.10.0.jar > >>>>>> - flink-s3-fs-hadoop-1.10.0.jar > >>>>>> - flink-s3-fs-presto-1.10.0.jar > >>>>>> - > >>>>>> > >>>>>> flink-shaded-netty-tcnative-dynamic-2.0.25.Final-9.0.jar > >>>>>> > >>>>>> - flink-sql-client_2.12-1.10.0.jar > >>>>>> - flink-state-processor-api_2.12-1.10.0.jar > >>>>>> - flink-swift-fs-hadoop-1.10.0.jar > >>>>>> > >>>>>> Current Flink dist is 267M. If we removed everything from > >>>>>> > >>>>>> opt > >>>>>> > >>>>>> we > >>>>>> > >>>>>> would > >>>>>> > >>>>>> go down to 126M. I would reccomend this, because the large > >>>>>> > >>>>>> majority > >>>>>> > >>>>>> of > >>>>>> > >>>>>> the files in opt are probably unused. > >>>>>> > >>>>>> What do you think? > >>>>>> > >>>>>> Best, > >>>>>> Aljoscha > >>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Best Regards > >>>>>> > >>>>>> Jeff Zhang > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Best, Jingsong Lee > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Best, Jingsong Lee > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > >> > > > >