Thanks for the suggestion, Till. I am curious about how do we usually decide when to put the jars into the opt folder?
Technically speaking, it seems that `flink-ml-api` should be put into the opt directory because they are actually API instead of libraries, just like CEP and Table. `flink-ml-lib` seems to be on the border. On one hand, it is a library. On the other hand, unlike SQL formats and Hadoop whose major code are outside of Flink, the algorithm codes are in Flink. So `flink-ml-lib` is more like those of built-in SQL UDFs. So it seems fine to either put it in the opt folder or in the downloads page. >From the user experience perspective, it might be better to have both `flink-ml-lib` and `flink-ml-api` in opt folder so users needn't go to two places for the required dependencies. Thanks, Jiangjie (Becket) Qin On Tue, Feb 4, 2020 at 2:32 PM Hequn Cheng <he...@apache.org> wrote: > Hi Till, > > Thanks a lot for your suggestion. It's a good idea to offer the flink-ml > libraries as optional dependencies on the download page which can make the > dist smaller. > > But I also have some concerns for it, e.g., the download page now only > includes the latest 3 releases. We may need to find ways to support more > versions. > On the other hand, the size of the flink-ml libraries now is very > small(about 246K), so it would not bring much impact on the size of dist. > > What do you think? > > Best, > Hequn > > On Mon, Feb 3, 2020 at 6:24 PM Till Rohrmann <trohrm...@apache.org> wrote: > >> An alternative solution would be to offer the flink-ml libraries as >> optional dependencies on the download page. Similar to how we offer the >> different SQL formats and Hadoop releases [1]. >> >> [1] https://flink.apache.org/downloads.html >> >> Cheers, >> Till >> >> On Mon, Feb 3, 2020 at 10:19 AM Hequn Cheng <he...@apache.org> wrote: >> >> > Thank you all for your feedback and suggestions! >> > >> > Best, Hequn >> > >> > On Mon, Feb 3, 2020 at 5:07 PM Becket Qin <becket....@gmail.com> wrote: >> > >> > > Thanks for bringing up the discussion, Hequn. >> > > >> > > +1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This would >> make >> > > it much easier for the users to try out some simple ml tasks. >> > > >> > > Thanks, >> > > >> > > Jiangjie (Becket) Qin >> > > >> > > On Mon, Feb 3, 2020 at 4:34 PM jincheng sun <sunjincheng...@gmail.com >> > >> > > wrote: >> > > >> > >> Thank you for pushing forward @Hequn Cheng <he...@apache.org> ! >> > >> >> > >> Hi @Becket Qin <becket....@gmail.com> , Do you have any concerns on >> > >> this ? >> > >> >> > >> Best, >> > >> Jincheng >> > >> >> > >> Hequn Cheng <he...@apache.org> 于2020年2月3日周一 下午2:09写道: >> > >> >> > >>> Hi everyone, >> > >>> >> > >>> Thanks for the feedback. As there are no objections, I've opened a >> JIRA >> > >>> issue(FLINK-15847[1]) to address this issue. >> > >>> The implementation details can be discussed in the issue or in the >> > >>> following PR. >> > >>> >> > >>> Best, >> > >>> Hequn >> > >>> >> > >>> [1] https://issues.apache.org/jira/browse/FLINK-15847 >> > >>> >> > >>> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng <chenghe...@gmail.com> >> > wrote: >> > >>> >> > >>> > Hi Jincheng, >> > >>> > >> > >>> > Thanks a lot for your feedback! >> > >>> > Yes, I agree with you. There are cases that multi jars need to be >> > >>> > uploaded. I will prepare another discussion later. Maybe with a >> > simple >> > >>> > design doc. >> > >>> > >> > >>> > Best, Hequn >> > >>> > >> > >>> > On Wed, Jan 8, 2020 at 3:06 PM jincheng sun < >> > sunjincheng...@gmail.com> >> > >>> > wrote: >> > >>> > >> > >>> >> Thanks for bring up this discussion Hequn! >> > >>> >> >> > >>> >> +1 for include `flink-ml-api` and `flink-ml-lib` in opt. >> > >>> >> >> > >>> >> BTW: I think would be great if bring up a discussion for upload >> > >>> multiple >> > >>> >> Jars at the same time. as PyFlink JOB also can have the benefit >> if >> > we >> > >>> do >> > >>> >> that improvement. >> > >>> >> >> > >>> >> Best, >> > >>> >> Jincheng >> > >>> >> >> > >>> >> >> > >>> >> Hequn Cheng <chenghe...@gmail.com> 于2020年1月8日周三 上午11:50写道: >> > >>> >> >> > >>> >> > Hi everyone, >> > >>> >> > >> > >>> >> > FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI which >> > moves >> > >>> >> Flink >> > >>> >> > ML a step further. Base on it, users can develop their ML jobs >> and >> > >>> more >> > >>> >> and >> > >>> >> > more machine learning platforms are providing ML services. >> > >>> >> > >> > >>> >> > However, the problem now is the jars of flink-ml-api and >> > >>> flink-ml-lib >> > >>> >> are >> > >>> >> > only exist on maven repo. Whenever users want to submit ML >> jobs, >> > >>> they >> > >>> >> can >> > >>> >> > only depend on the ml modules and package a fat jar. This >> would be >> > >>> >> > inconvenient especially for the machine learning platforms on >> > which >> > >>> >> nearly >> > >>> >> > all jobs depend on Flink ML modules and have to package a fat >> jar. >> > >>> >> > >> > >>> >> > Given this, it would be better to include jars of flink-ml-api >> and >> > >>> >> > flink-ml-lib in the `opt` folder, so that users can directly >> use >> > the >> > >>> >> jars >> > >>> >> > with the binary release. For example, users can move the jars >> into >> > >>> the >> > >>> >> > `lib` folder or use -j to upload the jars. (Currently, -j only >> > >>> support >> > >>> >> > upload one jar. Supporting multi jars for -j can be discussed >> in >> > >>> another >> > >>> >> > discussion.) >> > >>> >> > >> > >>> >> > Putting the jars in the `opt` folder instead of the `lib` >> folder >> > is >> > >>> >> because >> > >>> >> > currently, the ml jars are still optional for the Flink >> project by >> > >>> >> default. >> > >>> >> > >> > >>> >> > What do you think? Welcome any feedback! >> > >>> >> > >> > >>> >> > Best, >> > >>> >> > >> > >>> >> > Hequn >> > >>> >> > >> > >>> >> > [1] >> > >>> >> > >> > >>> >> > >> > >>> >> >> > >>> >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs >> > >>> >> > >> > >>> >> >> > >>> > >> > >>> >> > >> >> > >> >