+1 to include them for sql-client by default; +0 to put into lib and exposed to all kinds of jobs, including DataStream.
Danny Chan <yuzhao....@gmail.com> 于2020年6月5日周五 下午2:31写道: > +1, at least, we should keep an out of the box SQL-CLI, it’s very poor > experience to add such required format jars for SQL users. > > Best, > Danny Chan > 在 2020年6月5日 +0800 AM11:14,Jingsong Li <jingsongl...@gmail.com>,写道: > > Hi all, > > > > Considering that 1.11 will be released soon, what about my previous > > proposal? Put flink-csv, flink-json and flink-avro under lib. > > These three formats are very small and no third party dependence, and > they > > are widely used by table users. > > > > Best, > > Jingsong Lee > > > > On Tue, May 12, 2020 at 4:19 PM Jingsong Li <jingsongl...@gmail.com> > wrote: > > > > > Thanks for your discussion. > > > > > > Sorry to start discussing another thing: > > > > > > The biggest problem I see is the variety of problems caused by users' > lack > > > of format dependency. > > > As Aljoscha said, these three formats are very small and no third party > > > dependence, and they are widely used by table users. > > > Actually, we don't have any other built-in table formats now... In > total > > > 151K... > > > > > > 73K flink-avro-1.10.0.jar > > > 36K flink-csv-1.10.0.jar > > > 42K flink-json-1.10.0.jar > > > > > > So, Can we just put them into "lib/" or flink-table-uber? > > > It not solve all problems and maybe it is independent of "fat" and > "slim". > > > But also improve usability. > > > What do you think? Any objections? > > > > > > Best, > > > Jingsong Lee > > > > > > On Mon, May 11, 2020 at 5:48 PM Chesnay Schepler <ches...@apache.org> > > > wrote: > > > > > > > One downside would be that we're shipping more stuff when running on > > > > YARN for example, since the entire plugins directory is shiped by > default. > > > > > > > > On 17/04/2020 16:38, Stephan Ewen wrote: > > > > > @Aljoscha I think that is an interesting line of thinking. the > swift-fs > > > > may > > > > > be rarely enough used to move it to an optional download. > > > > > > > > > > I would still drop two more thoughts: > > > > > > > > > > (1) Now that we have plugins support, is there a reason to have a > > > > metrics > > > > > reporter or file system in /opt instead of /plugins? They don't > spoil > > > > the > > > > > class path any more. > > > > > > > > > > (2) I can imagine there still being a desire to have a "minimal" > docker > > > > > file, for users that want to keep the container images as small as > > > > > possible, to speed up deployment. It is fine if that would not be > the > > > > > default, though. > > > > > > > > > > > > > > > On Fri, Apr 17, 2020 at 12:16 PM Aljoscha Krettek < > aljos...@apache.org> > > > > > wrote: > > > > > > > > > > > I think having such tools and/or tailor-made distributions can > be nice > > > > > > but I also think the discussion is missing the main point: The > initial > > > > > > observation/motivation is that apparently a lot of users (Kurt > and I > > > > > > talked about this) on the chinese DingTalk support groups, and > other > > > > > > support channels have problems when first using the SQL client > because > > > > > > of these missing connectors/formats. For these, having > additional tools > > > > > > would not solve anything because they would also not take that > extra > > > > > > step. I think that even tiny friction should be avoided because > the > > > > > > annoyance from it accumulates of the (hopefully) many users that > we > > > > want > > > > > > to have. > > > > > > > > > > > > Maybe we should take a step back from discussing the > "fat"/"slim" idea > > > > > > and instead think about the composition of the current dist. As > > > > > > mentioned we have these jars in opt/: > > > > > > > > > > > > 17M flink-azure-fs-hadoop-1.10.0.jar > > > > > > 52K flink-cep-scala_2.11-1.10.0.jar > > > > > > 180K flink-cep_2.11-1.10.0.jar > > > > > > 746K flink-gelly-scala_2.11-1.10.0.jar > > > > > > 626K flink-gelly_2.11-1.10.0.jar > > > > > > 512K flink-metrics-datadog-1.10.0.jar > > > > > > 159K flink-metrics-graphite-1.10.0.jar > > > > > > 1.0M flink-metrics-influxdb-1.10.0.jar > > > > > > 102K flink-metrics-prometheus-1.10.0.jar > > > > > > 10K flink-metrics-slf4j-1.10.0.jar > > > > > > 12K flink-metrics-statsd-1.10.0.jar > > > > > > 36M flink-oss-fs-hadoop-1.10.0.jar > > > > > > 28M flink-python_2.11-1.10.0.jar > > > > > > 22K flink-queryable-state-runtime_2.11-1.10.0.jar > > > > > > 18M flink-s3-fs-hadoop-1.10.0.jar > > > > > > 31M flink-s3-fs-presto-1.10.0.jar > > > > > > 196K flink-shaded-netty-tcnative-dynamic-2.0.25.Final-9.0.jar > > > > > > 518K flink-sql-client_2.11-1.10.0.jar > > > > > > 99K flink-state-processor-api_2.11-1.10.0.jar > > > > > > 25M flink-swift-fs-hadoop-1.10.0.jar > > > > > > 160M opt > > > > > > > > > > > > The "filesystem" connectors ar ethe heavy hitters, there. > > > > > > > > > > > > I downloaded most of the SQL connectors/formats and this is what > I got: > > > > > > > > > > > > 73K flink-avro-1.10.0.jar > > > > > > 36K flink-csv-1.10.0.jar > > > > > > 55K flink-hbase_2.11-1.10.0.jar > > > > > > 88K flink-jdbc_2.11-1.10.0.jar > > > > > > 42K flink-json-1.10.0.jar > > > > > > 20M flink-sql-connector-elasticsearch6_2.11-1.10.0.jar > > > > > > 2.8M flink-sql-connector-kafka_2.11-1.10.0.jar > > > > > > 24M sql-connectors-formats > > > > > > > > > > > > We could just add these to the Flink distribution without > blowing it up > > > > > > by much. We could drop any of the existing "filesystem" > connectors from > > > > > > opt and add the SQL connectors/formats and not change the size > of Flink > > > > > > dist. So maybe we should do that instead? > > > > > > > > > > > > We would need some tooling for the sql-client shell script to > pick-up > > > > > > the connectors/formats up from opt/ because we don't want to add > them > > > > to > > > > > > lib/. We're already doing that for finding the flink-sql-client > jar, > > > > > > which is also not in lib/. > > > > > > > > > > > > What do you think? > > > > > > > > > > > > Best, > > > > > > Aljoscha > > > > > > > > > > > > On 17.04.20 05:22, Jark Wu wrote: > > > > > > > Hi, > > > > > > > > > > > > > > I like the idea of web tool to assemble fat distribution. And > the > > > > > > > https://code.quarkus.io/ looks very nice. > > > > > > > All the users need to do is just select what he/she need (I > think this > > > > > > step > > > > > > > can't be omitted anyway). > > > > > > > We can also provide a default fat distribution on the web which > > > > default > > > > > > > selects some popular connectors. > > > > > > > > > > > > > > Best, > > > > > > > Jark > > > > > > > > > > > > > > On Fri, 17 Apr 2020 at 02:29, Rafi Aroch <rafi.ar...@gmail.com > > > > > > wrote: > > > > > > > > > > > > > > > As a reference for a nice first-experience I had, take a > look at > > > > > > > > https://code.quarkus.io/ > > > > > > > > You reach this page after you click "Start Coding" at the > project > > > > > > homepage. > > > > > > > > Rafi > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Apr 16, 2020 at 6:53 PM Kurt Young <ykt...@gmail.com> > wrote: > > > > > > > > > > > > > > > > > I'm not saying pre-bundle some jars will make this problem > go away, > > > > and > > > > > > > > > you're right that only hides the problem for > > > > > > > > > some users. But what if this solution can hide the problem > for 90% > > > > > > users? > > > > > > > > > Would't that be good enough for us to try? > > > > > > > > > > > > > > > > > > Regarding to would users following instructions really be > such a big > > > > > > > > > problem? > > > > > > > > > I'm afraid yes. Otherwise I won't answer such questions > for at > > > > least a > > > > > > > > > dozen times and I won't see such questions coming > > > > > > > > > up from time to time. During some periods, I even saw such > questions > > > > > > > > every > > > > > > > > > day. > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > Kurt > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Apr 16, 2020 at 11:21 PM Chesnay Schepler < > > > > ches...@apache.org> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > The problem with having a distribution with "popular" > stuff is > > > > that it > > > > > > > > > > doesn't really *solve* a problem, it just hides it for > users who > > > > fall > > > > > > > > > > into these particular use-cases. > > > > > > > > > > Move out of it and you once again run into exact same > problems > > > > > > > > out-lined. > > > > > > > > > > This is exactly why I like the tooling approach; you > have to deal > > > > with > > > > > > > > it > > > > > > > > > > from the start and transitioning to a custom use-case is > easier. > > > > > > > > > > > > > > > > > > > > Would users following instructions really be such a big > problem? > > > > > > > > > > I would expect that users generally know *what *they > need, just not > > > > > > > > > > necessarily how it is assembled correctly (where do get > which jar, > > > > > > > > which > > > > > > > > > > directory to put it in). > > > > > > > > > > It seems like these are exactly the problem this would > solve? > > > > > > > > > > I just don't see how moving a jar corresponding to some > feature > > > > from > > > > > > > > opt > > > > > > > > > > to some directory (lib/plugins) is less error-prone than > just > > > > > > selecting > > > > > > > > > the > > > > > > > > > > feature and having the tool handle the rest. > > > > > > > > > > > > > > > > > > > > As for re-distributions, it depends on the form that the > tool would > > > > > > > > take. > > > > > > > > > > It could be an application that runs locally and works > against > > > > maven > > > > > > > > > > central (note: not necessarily *using* maven); this > should would > > > > work > > > > > > > > in > > > > > > > > > > China, no? > > > > > > > > > > > > > > > > > > > > A web tool would of course be fancy, but I don't know > how feasible > > > > > > this > > > > > > > > > is > > > > > > > > > > with the ASF infrastructure. > > > > > > > > > > You wouldn't be able to mirror the distribution, so the > load can't > > > > be > > > > > > > > > > distributed. I doubt INFRA would like this. > > > > > > > > > > > > > > > > > > > > Note that third-parties could also start distributing > use-case > > > > > > oriented > > > > > > > > > > distributions, which would be perfectly fine as far as > I'm > > > > concerned. > > > > > > > > > > > > > > > > > > > > On 16/04/2020 16:57, Kurt Young wrote: > > > > > > > > > > > > > > > > > > > > I'm not so sure about the web tool solution though. The > concern I > > > > have > > > > > > > > > for > > > > > > > > > > this approach is the final generated > > > > > > > > > > distribution is kind of non-deterministic. We might > generate too > > > > many > > > > > > > > > > different combinations when user trying to > > > > > > > > > > package different types of connector, format, and even > maybe hadoop > > > > > > > > > > releases. As far as I can tell, most open > > > > > > > > > > source projects and apache projects will only release > some > > > > > > > > > > pre-defined distributions, which most users are already > > > > > > > > > > familiar with, thus hard to change IMO. And I also have > went > > > > through > > > > > > in > > > > > > > > > > some cases, users will try to re-distribute > > > > > > > > > > the release package, because of the unstable network of > apache > > > > website > > > > > > > > > from > > > > > > > > > > China. In web tool solution, I don't > > > > > > > > > > think this kind of re-distribution would be possible > anymore. > > > > > > > > > > > > > > > > > > > > In the meantime, I also have a concern that we will fall > back into > > > > our > > > > > > > > > trap > > > > > > > > > > again if we try to offer this smart & flexible > > > > > > > > > > solution. Because it needs users to cooperate with such > mechanism. > > > > > > It's > > > > > > > > > > exactly the situation what we currently fell > > > > > > > > > > into: > > > > > > > > > > 1. We offered a smart solution. > > > > > > > > > > 2. We hope users will follow the correct instructions. > > > > > > > > > > 3. Everything will work as expected if users followed > the right > > > > > > > > > > instructions. > > > > > > > > > > > > > > > > > > > > In reality, I suspect not all users will do the second > step > > > > correctly. > > > > > > > > > And > > > > > > > > > > for new users who only trying to have a quick > > > > > > > > > > experience with Flink, I would bet most users will do it > wrong. > > > > > > > > > > > > > > > > > > > > So, my proposal would be one of the following 2 options: > > > > > > > > > > 1. Provide a slim distribution for advanced product > users and > > > > provide > > > > > > a > > > > > > > > > > distribution which will have some popular builtin jars. > > > > > > > > > > 2. Only provide a distribution which will have some > popular builtin > > > > > > > > jars. > > > > > > > > > > If we are trying to reduce the distributions we > released, I would > > > > > > > > prefer > > > > > > > > > 2 > > > > > > > > > > 1. > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Kurt > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Apr 16, 2020 at 9:33 PM Till Rohrmann < > > > > trohrm...@apache.org> > > > > > > < > > > > > > > > > trohrm...@apache.org> wrote: > > > > > > > > > > > > > > > > > > > > I think what Chesnay and Dawid proposed would be the > ideal > > > > solution. > > > > > > > > > > Ideally, we would also have a nice web tool for the > website which > > > > > > > > > generates > > > > > > > > > > the corresponding distribution for download. > > > > > > > > > > > > > > > > > > > > To get things started we could start with only > supporting to > > > > > > > > > > download/creating the "fat" version with the script. The > fat > > > > version > > > > > > > > > would > > > > > > > > > > then consist of the slim distribution and whatever we > deem > > > > important > > > > > > > > for > > > > > > > > > > new users to get started. > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > Till > > > > > > > > > > > > > > > > > > > > On Thu, Apr 16, 2020 at 11:33 AM Dawid Wysakowicz < > > > > > > > > > dwysakow...@apache.org> <dwysakow...@apache.org> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > Few points from my side: > > > > > > > > > > > > > > > > > > > > 1. I like the idea of simplifying the experience for > first time > > > > users. > > > > > > > > > > As for production use cases I share Jark's opinion that > in this > > > > case I > > > > > > > > > > would expect users to combine their distribution > manually. I think > > > > in > > > > > > > > > > such scenarios it is important to understand > interconnections. > > > > > > > > > > Personally I'd expect the slimmest possible distribution > that I can > > > > > > > > > > extend further with what I need in my production > scenario. > > > > > > > > > > > > > > > > > > > > 2. I think there is also the problem that the matrix of > possible > > > > > > > > > > combinations that can be useful is already big. Do we > want to have > > > > a > > > > > > > > > > distribution for: > > > > > > > > > > > > > > > > > > > > SQL users: which connectors should we include? should we > > > > include > > > > > > > > > > hive? which other catalog? > > > > > > > > > > > > > > > > > > > > DataStream users: which connectors should we include? > > > > > > > > > > > > > > > > > > > > For both of the above should we include yarn/kubernetes? > > > > > > > > > > > > > > > > > > > > I would opt for providing only the "slim" distribution > as a release > > > > > > > > > > artifact. > > > > > > > > > > > > > > > > > > > > 3. However, as I said I think its worth investigating > how we can > > > > > > > > improve > > > > > > > > > > users experience. What do you think of providing a tool, > could be > > > > e.g. > > > > > > > > a > > > > > > > > > > shell script that constructs a distribution based on > users choice. > > > > I > > > > > > > > > > think that was also what Chesnay mentioned as "tooling to > > > > > > > > > > assemble custom distributions" In the end how I see the > difference > > > > > > > > > > between a slim and fat distribution is which jars do we > put into > > > > the > > > > > > > > > > lib, right? It could have a few "screens". > > > > > > > > > > > > > > > > > > > > 1. Which API are you interested in: > > > > > > > > > > a. SQL API > > > > > > > > > > b. DataStream API > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. [SQL] Which connectors do you want to use? > [multichoice]: > > > > > > > > > > a. Kafka > > > > > > > > > > b. Elasticsearch > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > > > 3. [SQL] Which catalog you want to use? > > > > > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > > > Such a tool would download all the dependencies from > maven and put > > > > > > them > > > > > > > > > > into the correct folder. In the future we can extend it > with > > > > > > additional > > > > > > > > > > rules e.g. kafka-0.9 cannot be chosen at the same time > with > > > > > > > > > > kafka-universal etc. > > > > > > > > > > > > > > > > > > > > The benefit of it would be that the distribution that we > release > > > > could > > > > > > > > > > remain "slim" or we could even make it slimmer. I might > be missing > > > > > > > > > > something here though. > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > > > > > > Dawdi > > > > > > > > > > > > > > > > > > > > On 16/04/2020 11:02, Aljoscha Krettek wrote: > > > > > > > > > > > > > > > > > > > > I want to reinforce my opinion from earlier: This is > about > > > > improving > > > > > > > > > > the situation both for first-time users and for > experienced users > > > > that > > > > > > > > > > want to use a Flink dist in production. The current > Flink dist is > > > > too > > > > > > > > > > "thin" for first-time SQL users and it is too "fat" for > production > > > > > > > > > > users, that is where serving no-one properly with the > current > > > > > > > > > > middle-ground. That's why I think introducing those > specialized > > > > > > > > > > "spins" of Flink dist would be good. > > > > > > > > > > > > > > > > > > > > By the way, at some point in the future production users > might not > > > > > > > > > > even need to get a Flink dist anymore. They should be > able to have > > > > > > > > > > Flink as a dependency of their project (including the > runtime) and > > > > > > > > > > then build an image from this for Kubernetes or a fat > jar for YARN. > > > > > > > > > > > > > > > > > > > > Aljoscha > > > > > > > > > > > > > > > > > > > > On 15.04.20 18:14, wenlong.lwl wrote: > > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > Regarding slim and fat distributions, I think different > kinds of > > > > jobs > > > > > > > > > > may > > > > > > > > > > prefer different type of distribution: > > > > > > > > > > > > > > > > > > > > For DataStream job, I think we may not like fat > distribution > > > > > > > > > > > > > > > > > > > > containing > > > > > > > > > > > > > > > > > > > > connectors because user would always need to depend on > the > > > > connector > > > > > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > > > user code, it is easy to include the connector jar in > the user lib. > > > > > > > > > > > > > > > > > > > > Less > > > > > > > > > > > > > > > > > > > > jar in lib means less class conflicts and problems. > > > > > > > > > > > > > > > > > > > > For SQL job, I think we are trying to encourage user to > user pure > > > > > > > > > > sql(DDL + > > > > > > > > > > DML) to construct their job, In order to improve user > experience, > > > > It > > > > > > > > > > may be > > > > > > > > > > important for flink, not only providing as many > connector jar in > > > > > > > > > > distribution as possible especially the connector and > format we > > > > have > > > > > > > > > > well > > > > > > > > > > documented, but also providing an mechanism to load > connectors > > > > > > > > > > according > > > > > > > > > > to the DDLs, > > > > > > > > > > > > > > > > > > > > So I think it could be good to place connector/format > jars in some > > > > > > > > > > dir like > > > > > > > > > > opt/connector which would not affect jobs by default, and > > > > introduce a > > > > > > > > > > mechanism of dynamic discovery for SQL. > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Wenlong > > > > > > > > > > > > > > > > > > > > On Wed, 15 Apr 2020 at 22:46, Jingsong Li < > jingsongl...@gmail.com> > > > > < > > > > > > > > > jingsongl...@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > I am thinking both "improve first experience" and > "improve > > > > production > > > > > > > > > > experience". > > > > > > > > > > > > > > > > > > > > I'm thinking about what's the common mode of Flink? > > > > > > > > > > Streaming job use Kafka? Batch job use Hive? > > > > > > > > > > > > > > > > > > > > Hive 1.2.1 dependencies can be compatible with most of > Hive server > > > > > > > > > > versions. So Spark and Presto have built-in Hive 1.2.1 > dependency. > > > > > > > > > > Flink is currently mainly used for streaming, so let's > not talk > > > > > > > > > > about hive. > > > > > > > > > > > > > > > > > > > > For streaming jobs, first of all, the jobs in my mind is > (related > > > > to > > > > > > > > > > connectors): > > > > > > > > > > - ETL jobs: Kafka -> Kafka > > > > > > > > > > - Join jobs: Kafka -> DimJDBC -> Kafka > > > > > > > > > > - Aggregation jobs: Kafka -> JDBCSink > > > > > > > > > > So Kafka and JDBC are probably the most commonly used. > Of course, > > > > > > > > > > > > > > > > > > > > also > > > > > > > > > > > > > > > > > > > > includes CSV, JSON's formats. > > > > > > > > > > So when we provide such a fat distribution: > > > > > > > > > > - With CSV, JSON. > > > > > > > > > > - With flink-kafka-universal and kafka dependencies. > > > > > > > > > > - With flink-jdbc. > > > > > > > > > > Using this fat distribution, most users can run their > jobs well. > > > > > > > > > > > > > > > > > > > > (jdbc > > > > > > > > > > > > > > > > > > > > driver jar required, but this is very natural to do) > > > > > > > > > > Can these dependencies lead to kinds of conflicts? Only > Kafka may > > > > > > > > > > > > > > > > > > > > have > > > > > > > > > > > > > > > > > > > > conflicts, but if our goal is to use kafka-universal to > support all > > > > > > > > > > Kafka > > > > > > > > > > versions, it is hopeful to target the vast majority of > users. > > > > > > > > > > > > > > > > > > > > We don't want to plug all jars into the fat > distribution. Only need > > > > > > > > > > less > > > > > > > > > > conflict and common. of course, it is a matter of > consideration to > > > > > > > > > > > > > > > > > > > > put > > > > > > > > > > > > > > > > > > > > which jar into fat distribution. > > > > > > > > > > We have the opportunity to facilitate the majority of > users, but > > > > > > > > > > also left > > > > > > > > > > opportunities for customization. > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Jingsong Lee > > > > > > > > > > > > > > > > > > > > On Wed, Apr 15, 2020 at 10:09 PM Jark Wu < > imj...@gmail.com> < > > > > > > > > > imj...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > I think we should first reach an consensus on "what > problem do we > > > > > > > > > > want to > > > > > > > > > > solve?" > > > > > > > > > > (1) improve first experience? or (2) improve production > experience? > > > > > > > > > > > > > > > > > > > > As far as I can see, with the above discussion, I think > what we > > > > > > > > > > want to > > > > > > > > > > solve is the "first experience". > > > > > > > > > > And I think the slim jar is still the best distribution > for > > > > > > > > > > production, > > > > > > > > > > because it's easier to assembling jars > > > > > > > > > > than excluding jars and can avoid potential class > conflicts. > > > > > > > > > > > > > > > > > > > > If we want to improve "first experience", I think it > make sense to > > > > > > > > > > have a > > > > > > > > > > fat distribution to give users a more smooth first > experience. > > > > > > > > > > But I would like to call it "playground distribution" or > something > > > > > > > > > > like > > > > > > > > > > that to explicitly differ from the "slim > production-purpose > > > > > > > > > > > > > > > > > > > > distribution". > > > > > > > > > > > > > > > > > > > > The "playground distribution" can contains some widely > used jars, > > > > > > > > > > > > > > > > > > > > like > > > > > > > > > > > > > > > > > > > > universal-kafka-sql-connector, > elasticsearch7-sql-connector, avro, > > > > > > > > > > json, > > > > > > > > > > csv, etc.. > > > > > > > > > > Even we can provide a playground docker which may > contain the fat > > > > > > > > > > distribution, python3, and hive. > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Jark > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, 15 Apr 2020 at 21:47, Chesnay Schepler < > ches...@apache.org> > > > > < > > > > > > > > > ches...@apache.org> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > I don't see a lot of value in having multiple > distributions. > > > > > > > > > > > > > > > > > > > > The simple reality is that no fat distribution we could > provide > > > > > > > > > > > > > > > > > > > > would > > > > > > > > > > > > > > > > > > > > satisfy all use-cases, so why even try. > > > > > > > > > > If users commonly run into issues for certain jars, then > maybe > > > > > > > > > > > > > > > > > > > > those > > > > > > > > > > > > > > > > > > > > should be added to the current distribution. > > > > > > > > > > > > > > > > > > > > Personally though I still believe we should only > distribute a slim > > > > > > > > > > version. I'd rather have users always add required jars > to the > > > > > > > > > > distribution than only when they go outside our > "expected" > > > > > > > > > > > > > > > > > > > > use-cases. > > > > > > > > > > > > > > > > > > > > Then we might finally address this issue properly, i.e., > tooling to > > > > > > > > > > assemble custom distributions and/or better error > messages if > > > > > > > > > > Flink-provided extensions cannot be found. > > > > > > > > > > > > > > > > > > > > On 15/04/2020 15:23, Kurt Young wrote: > > > > > > > > > > > > > > > > > > > > Regarding to the specific solution, I'm not sure about > the "fat" > > > > > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > > > > "slim" > > > > > > > > > > > > > > > > > > > > solution though. I get the idea > > > > > > > > > > that we can make the slim one even more lightweight than > current > > > > > > > > > > distribution, but what about the "fat" > > > > > > > > > > one? Do you mean that we would package all connectors > and formats > > > > > > > > > > > > > > > > > > > > into > > > > > > > > > > > > > > > > > > > > this? I'm not sure if this is > > > > > > > > > > feasible. For example, we can't put all versions of > kafka and hive > > > > > > > > > > connector jars into lib directory, and > > > > > > > > > > we also might need hadoop jars when using filesystem > connector to > > > > > > > > > > > > > > > > > > > > access > > > > > > > > > > > > > > > > > > > > data from HDFS. > > > > > > > > > > > > > > > > > > > > So my guess would be we might hand-pick some of the most > > > > > > > > > > > > > > > > > > > > frequently > > > > > > > > > > > > > > > > > > > > used > > > > > > > > > > > > > > > > > > > > connectors and formats > > > > > > > > > > into our "lib" directory, like kafka, csv, json metioned > above, > > > > > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > > > > still > > > > > > > > > > > > > > > > > > > > leave some other connectors out of it. > > > > > > > > > > If this is the case, then why not we just provide this > > > > > > > > > > > > > > > > > > > > distribution > > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > user? I'm not sure i get the benefit of > > > > > > > > > > providing another super "slim" jar (we have to pay some > costs to > > > > > > > > > > > > > > > > > > > > provide > > > > > > > > > > > > > > > > > > > > another suit of distribution). > > > > > > > > > > > > > > > > > > > > What do you think? > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Kurt > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Apr 15, 2020 at 7:08 PM Jingsong Li < > > > > > > > > > > > > > > > > > > > > jingsongl...@gmail.com > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > Big +1. > > > > > > > > > > > > > > > > > > > > I like "fat" and "slim". > > > > > > > > > > > > > > > > > > > > For csv and json, like Jark said, they are quite small > and don't > > > > > > > > > > > > > > > > > > > > have > > > > > > > > > > > > > > > > > > > > other > > > > > > > > > > > > > > > > > > > > dependencies. They are important to kafka connector, and > > > > > > > > > > > > > > > > > > > > important > > > > > > > > > > > > > > > > > > > > to upcoming file system connector too. > > > > > > > > > > So can we move them to both "fat" and "slim"? They're so > > > > > > > > > > > > > > > > > > > > important, > > > > > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > > > > they're so lightweight. > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Jingsong Lee > > > > > > > > > > > > > > > > > > > > On Wed, Apr 15, 2020 at 4:53 PM godfrey he < > godfre...@gmail.com> < > > > > > > > > > godfre...@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > Big +1. > > > > > > > > > > This will improve user experience (special for Flink new > users). > > > > > > > > > > We answered so many questions about "class not found". > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Godfrey > > > > > > > > > > > > > > > > > > > > Dian Fu <dian0511...@gmail.com> <dian0511...@gmail.com> > > > > 于2020年4月15日周三 > > > > > > > > > 下午4:30写道: > > > > > > > > > > > > > > > > > > > > +1 to this proposal. > > > > > > > > > > > > > > > > > > > > Missing connector jars is also a big problem for PyFlink > users. > > > > > > > > > > > > > > > > > > > > Currently, > > > > > > > > > > > > > > > > > > > > after a Python user has installed PyFlink using `pip`, > he has > > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > manually > > > > > > > > > > > > > > > > > > > > copy the connector fat jars to the PyFlink installation > > > > > > > > > > > > > > > > > > > > directory > > > > > > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > connectors to be used if he wants to run jobs locally. > This > > > > > > > > > > > > > > > > > > > > process > > > > > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > > > very > > > > > > > > > > > > > > > > > > > > confuse for users and affects the experience a lot. > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > Dian > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 在 2020年4月15日,下午3:51,Jark Wu <imj...@gmail.com> < > imj...@gmail.com> > > > > 写道: > > > > > > > > > > > > > > > > > > > > +1 to the proposal. I also found the "download > additional jar" > > > > > > > > > > > > > > > > > > > > step > > > > > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > > > really verbose when I prepare webinars. > > > > > > > > > > > > > > > > > > > > At least, I think the flink-csv and flink-json should in > the > > > > > > > > > > > > > > > > > > > > distribution, > > > > > > > > > > > > > > > > > > > > they are quite small and don't have other dependencies. > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Jark > > > > > > > > > > > > > > > > > > > > On Wed, 15 Apr 2020 at 15:44, Jeff Zhang < > zjf...@gmail.com> < > > > > > > > > > zjf...@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > Hi Aljoscha, > > > > > > > > > > > > > > > > > > > > Big +1 for the fat flink distribution, where do you plan > to > > > > > > > > > > > > > > > > > > > > put > > > > > > > > > > > > > > > > > > > > these > > > > > > > > > > > > > > > > > > > > connectors ? opt or lib ? > > > > > > > > > > > > > > > > > > > > Aljoscha Krettek <aljos...@apache.org> < > aljos...@apache.org> > > > > > > > > > 于2020年4月15日周三 > > > > > > > > > > 下午3:30写道: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Everyone, > > > > > > > > > > > > > > > > > > > > I'd like to discuss about releasing a more full-featured > > > > > > > > > > > > > > > > > > > > Flink > > > > > > > > > > > > > > > > > > > > distribution. The motivation is that there is friction > for > > > > > > > > > > > > > > > > > > > > SQL/Table > > > > > > > > > > > > > > > > > > > > API > > > > > > > > > > > > > > > > > > > > users that want to use Table connectors which are not > there > > > > > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > current Flink Distribution. For these users the workflow > is > > > > > > > > > > > > > > > > > > > > currently > > > > > > > > > > > > > > > > > > > > roughly: > > > > > > > > > > > > > > > > > > > > - download Flink dist > > > > > > > > > > - configure csv/Kafka/json connectors per configuration > > > > > > > > > > - run SQL client or program > > > > > > > > > > - decrypt error message and research the solution > > > > > > > > > > - download additional connector jars > > > > > > > > > > - program works correctly > > > > > > > > > > > > > > > > > > > > I realize that this can be made to work but if every SQL > > > > > > > > > > > > > > > > > > > > user > > > > > > > > > > > > > > > > > > > > has > > > > > > > > > > > > > > > > > > > > this > > > > > > > > > > > > > > > > > > > > as their first experience that doesn't seem good to me. > > > > > > > > > > > > > > > > > > > > My proposal is to provide two versions of the Flink > > > > > > > > > > > > > > > > > > > > Distribution > > > > > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > future: "fat" and "slim" (names to be discussed): > > > > > > > > > > > > > > > > > > > > - slim would be even trimmer than todays distribution > > > > > > > > > > - fat would contain a lot of convenience connectors (yet > > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > > > > determined which one) > > > > > > > > > > > > > > > > > > > > And yes, I realize that there are already more > dimensions of > > > > > > > > > > > > > > > > > > > > Flink > > > > > > > > > > > > > > > > > > > > releases (Scala version and Java version). > > > > > > > > > > > > > > > > > > > > For background, our current Flink dist has these in the > opt > > > > > > > > > > > > > > > > > > > > directory: > > > > > > > > > > > > > > > > > > > > - flink-azure-fs-hadoop-1.10.0.jar > > > > > > > > > > - flink-cep-scala_2.12-1.10.0.jar > > > > > > > > > > - flink-cep_2.12-1.10.0.jar > > > > > > > > > > - flink-gelly-scala_2.12-1.10.0.jar > > > > > > > > > > - flink-gelly_2.12-1.10.0.jar > > > > > > > > > > - flink-metrics-datadog-1.10.0.jar > > > > > > > > > > - flink-metrics-graphite-1.10.0.jar > > > > > > > > > > - flink-metrics-influxdb-1.10.0.jar > > > > > > > > > > - flink-metrics-prometheus-1.10.0.jar > > > > > > > > > > - flink-metrics-slf4j-1.10.0.jar > > > > > > > > > > - flink-metrics-statsd-1.10.0.jar > > > > > > > > > > - flink-oss-fs-hadoop-1.10.0.jar > > > > > > > > > > - flink-python_2.12-1.10.0.jar > > > > > > > > > > - flink-queryable-state-runtime_2.12-1.10.0.jar > > > > > > > > > > - flink-s3-fs-hadoop-1.10.0.jar > > > > > > > > > > - flink-s3-fs-presto-1.10.0.jar > > > > > > > > > > - > > > > > > > > > > > > > > > > > > > > flink-shaded-netty-tcnative-dynamic-2.0.25.Final-9.0.jar > > > > > > > > > > > > > > > > > > > > - flink-sql-client_2.12-1.10.0.jar > > > > > > > > > > - flink-state-processor-api_2.12-1.10.0.jar > > > > > > > > > > - flink-swift-fs-hadoop-1.10.0.jar > > > > > > > > > > > > > > > > > > > > Current Flink dist is 267M. If we removed everything from > > > > > > > > > > > > > > > > > > > > opt > > > > > > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > > > > > would > > > > > > > > > > > > > > > > > > > > go down to 126M. I would reccomend this, because the > large > > > > > > > > > > > > > > > > > > > > majority > > > > > > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > > > > > > the files in opt are probably unused. > > > > > > > > > > > > > > > > > > > > What do you think? > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Aljoscha > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Best Regards > > > > > > > > > > > > > > > > > > > > Jeff Zhang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Best, Jingsong Lee > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Best, Jingsong Lee > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Best, Jingsong Lee > > > > > > > > > -- > > Best, Jingsong Lee > -- Best, Benchao Li