I would not object given that it is rather small at the moment. However, I also think that we should have a plan how to handle the ever growing Flink ecosystem and how to make it easily accessible to our users. E.g. one far fetched idea could be something like a configuration script which downloads the required components for the user. But this deserves definitely a separate discussion and does not really belong here.
Cheers, Till On Thu, Feb 6, 2020 at 3:35 PM Hequn Cheng <he...@apache.org> wrote: > > Hi everyone, > > Thank you all for the great inputs! > > I think probably what we all agree on is we should try to make a leaner > flink-dist. However, we may also need to do some compromises considering > the user experience that users don't need to download the dependencies from > different places. Otherwise, we can move all the jars in the current opt > folder to the download page. > > The missing of clear rules for guiding such compromises makes things more > complicated now. I would agree that the decisive factor for what goes into > Flink's binary distribution should be how core it is to Flink. Meanwhile, > it's better to treat Flink API as a (core) core to Flink. Not only it is a > very clear rule that easy to be followed but also in most cases, API is > very significant and deserved to be included in the dist. > > Given this, it might make sense to put flink-ml-api and flink-ml-lib into > the opt. > What do you think? > > Best, > Hequn > > On Wed, Feb 5, 2020 at 12:39 AM Chesnay Schepler <ches...@apache.org> > wrote: > >> Around a year ago I started a discussion >> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/DISCUSS-Towards-a-leaner-flink-dist-tp25615.html> >> on reducing the amount of jars we ship with the distribution. >> >> While there was no definitive conclusion there was a shared sentiment >> that APIs should be shipped with the distribution. >> >> On 04/02/2020 17:25, Till Rohrmann wrote: >> >> I think there is no such rule that APIs go automatically into opt/ and >> "libraries" not. The contents of opt/ have mainly grown over time w/o >> following a strict rule. >> >> I think the decisive factor for what goes into Flink's binary distribution >> should be how core it is to Flink. Of course another important >> consideration is which use cases Flink should promote "out of the box" (not >> sure whether this is actual true for content shipped in opt/ because you >> also have to move it to lib). >> >> For example, Gelly would be an example which I would rather see as an >> optional component than shipping it with every Flink binary distribution. >> >> Cheers, >> Till >> >> On Tue, Feb 4, 2020 at 11:24 AM Becket Qin <becket....@gmail.com> >> <becket....@gmail.com> wrote: >> >> >> Thanks for the suggestion, Till. >> >> I am curious about how do we usually decide when to put the jars into the >> opt folder? >> >> Technically speaking, it seems that `flink-ml-api` should be put into the >> opt directory because they are actually API instead of libraries, just like >> CEP and Table. >> >> `flink-ml-lib` seems to be on the border. On one hand, it is a library. On >> the other hand, unlike SQL formats and Hadoop whose major code are outside >> of Flink, the algorithm codes are in Flink. So `flink-ml-lib` is more like >> those of built-in SQL UDFs. So it seems fine to either put it in the opt >> folder or in the downloads page. >> >> From the user experience perspective, it might be better to have both >> `flink-ml-lib` and `flink-ml-api` in opt folder so users needn't go to two >> places for the required dependencies. >> >> Thanks, >> >> Jiangjie (Becket) Qin >> >> On Tue, Feb 4, 2020 at 2:32 PM Hequn Cheng <he...@apache.org> >> <he...@apache.org> wrote: >> >> >> Hi Till, >> >> Thanks a lot for your suggestion. It's a good idea to offer the flink-ml >> libraries as optional dependencies on the download page which can make >> >> the >> >> dist smaller. >> >> But I also have some concerns for it, e.g., the download page now only >> includes the latest 3 releases. We may need to find ways to support more >> versions. >> On the other hand, the size of the flink-ml libraries now is very >> small(about 246K), so it would not bring much impact on the size of dist. >> >> What do you think? >> >> Best, >> Hequn >> >> On Mon, Feb 3, 2020 at 6:24 PM Till Rohrmann <trohrm...@apache.org> >> <trohrm...@apache.org> >> >> wrote: >> >> An alternative solution would be to offer the flink-ml libraries as >> optional dependencies on the download page. Similar to how we offer the >> different SQL formats and Hadoop releases [1]. >> >> [1] https://flink.apache.org/downloads.html >> >> Cheers, >> Till >> >> On Mon, Feb 3, 2020 at 10:19 AM Hequn Cheng <he...@apache.org> >> <he...@apache.org> wrote: >> >> >> Thank you all for your feedback and suggestions! >> >> Best, Hequn >> >> On Mon, Feb 3, 2020 at 5:07 PM Becket Qin <becket....@gmail.com> >> <becket....@gmail.com> >> >> wrote: >> >> Thanks for bringing up the discussion, Hequn. >> >> +1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This would >> >> make >> >> it much easier for the users to try out some simple ml tasks. >> >> Thanks, >> >> Jiangjie (Becket) Qin >> >> On Mon, Feb 3, 2020 at 4:34 PM jincheng sun < >> >> sunjincheng...@gmail.com >> >> wrote: >> >> >> Thank you for pushing forward @Hequn Cheng <he...@apache.org> >> <he...@apache.org> ! >> >> Hi @Becket Qin <becket....@gmail.com> <becket....@gmail.com> , Do you have >> any concerns >> >> on >> >> this ? >> >> Best, >> Jincheng >> >> Hequn Cheng <he...@apache.org> <he...@apache.org> 于2020年2月3日周一 下午2:09写道: >> >> >> Hi everyone, >> >> Thanks for the feedback. As there are no objections, I've opened a >> >> JIRA >> >> issue(FLINK-15847[1]) to address this issue. >> The implementation details can be discussed in the issue or in the >> following PR. >> >> Best, >> Hequn >> >> [1] https://issues.apache.org/jira/browse/FLINK-15847 >> >> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng <chenghe...@gmail.com> >> <chenghe...@gmail.com> >> >> wrote: >> >> Hi Jincheng, >> >> Thanks a lot for your feedback! >> Yes, I agree with you. There are cases that multi jars need to >> >> be >> >> uploaded. I will prepare another discussion later. Maybe with a >> >> simple >> >> design doc. >> >> Best, Hequn >> >> On Wed, Jan 8, 2020 at 3:06 PM jincheng sun < >> >> sunjincheng...@gmail.com> >> >> wrote: >> >> >> Thanks for bring up this discussion Hequn! >> >> +1 for include `flink-ml-api` and `flink-ml-lib` in opt. >> >> BTW: I think would be great if bring up a discussion for upload >> >> multiple >> >> Jars at the same time. as PyFlink JOB also can have the benefit >> >> if >> >> we >> >> do >> >> that improvement. >> >> Best, >> Jincheng >> >> >> Hequn Cheng <chenghe...@gmail.com> <chenghe...@gmail.com> 于2020年1月8日周三 >> 上午11:50写道: >> >> >> Hi everyone, >> >> FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI >> >> which >> >> moves >> >> Flink >> >> ML a step further. Base on it, users can develop their ML >> >> jobs >> >> and >> >> more >> >> and >> >> more machine learning platforms are providing ML services. >> >> However, the problem now is the jars of flink-ml-api and >> >> flink-ml-lib >> >> are >> >> only exist on maven repo. Whenever users want to submit ML >> >> jobs, >> >> they >> >> can >> >> only depend on the ml modules and package a fat jar. This >> >> would be >> >> inconvenient especially for the machine learning platforms on >> >> which >> >> nearly >> >> all jobs depend on Flink ML modules and have to package a fat >> >> jar. >> >> Given this, it would be better to include jars of >> >> flink-ml-api >> >> and >> >> flink-ml-lib in the `opt` folder, so that users can directly >> >> use >> >> the >> >> jars >> >> with the binary release. For example, users can move the jars >> >> into >> >> the >> >> `lib` folder or use -j to upload the jars. (Currently, -j >> >> only >> >> support >> >> upload one jar. Supporting multi jars for -j can be discussed >> >> in >> >> another >> >> discussion.) >> >> Putting the jars in the `opt` folder instead of the `lib` >> >> folder >> >> is >> >> because >> >> currently, the ml jars are still optional for the Flink >> >> project by >> >> default. >> >> What do you think? Welcome any feedback! >> >> Best, >> >> Hequn >> >> [1] >> >> >> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs >> >> >>