Thanks for the suggestion, Till.

I am curious about how do we usually decide when to put the jars into the
opt folder?

Technically speaking, it seems that `flink-ml-api` should be put into the
opt directory because they are actually API instead of libraries, just like
CEP and Table.

`flink-ml-lib` seems to be on the border. On one hand, it is a library. On
the other hand, unlike SQL formats and Hadoop whose major code are outside
of Flink, the algorithm codes are in Flink. So `flink-ml-lib` is more like
those of built-in SQL UDFs. So it seems fine to either put it in the opt
folder or in the downloads page.

>From the user experience perspective, it might be better to have both
`flink-ml-lib` and `flink-ml-api` in opt folder so users needn't go to two
places for the required dependencies.

Thanks,

Jiangjie (Becket) Qin

On Tue, Feb 4, 2020 at 2:32 PM Hequn Cheng <he...@apache.org> wrote:

> Hi Till,
>
> Thanks a lot for your suggestion. It's a good idea to offer the flink-ml
> libraries as optional dependencies on the download page which can make the
> dist smaller.
>
> But I also have some concerns for it, e.g., the download page now only
> includes the latest 3 releases. We may need to find ways to support more
> versions.
> On the other hand, the size of the flink-ml libraries now is very
> small(about 246K), so it would not bring much impact on the size of dist.
>
> What do you think?
>
> Best,
> Hequn
>
> On Mon, Feb 3, 2020 at 6:24 PM Till Rohrmann <trohrm...@apache.org> wrote:
>
>> An alternative solution would be to offer the flink-ml libraries as
>> optional dependencies on the download page. Similar to how we offer the
>> different SQL formats and Hadoop releases [1].
>>
>> [1] https://flink.apache.org/downloads.html
>>
>> Cheers,
>> Till
>>
>> On Mon, Feb 3, 2020 at 10:19 AM Hequn Cheng <he...@apache.org> wrote:
>>
>> > Thank you all for your feedback and suggestions!
>> >
>> > Best, Hequn
>> >
>> > On Mon, Feb 3, 2020 at 5:07 PM Becket Qin <becket....@gmail.com> wrote:
>> >
>> > > Thanks for bringing up the discussion, Hequn.
>> > >
>> > > +1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This would
>> make
>> > > it much easier for the users to try out some simple ml tasks.
>> > >
>> > > Thanks,
>> > >
>> > > Jiangjie (Becket) Qin
>> > >
>> > > On Mon, Feb 3, 2020 at 4:34 PM jincheng sun <sunjincheng...@gmail.com
>> >
>> > > wrote:
>> > >
>> > >> Thank you for pushing forward @Hequn Cheng <he...@apache.org> !
>> > >>
>> > >> Hi  @Becket Qin <becket....@gmail.com> , Do you have any concerns on
>> > >> this ?
>> > >>
>> > >> Best,
>> > >> Jincheng
>> > >>
>> > >> Hequn Cheng <he...@apache.org> 于2020年2月3日周一 下午2:09写道:
>> > >>
>> > >>> Hi everyone,
>> > >>>
>> > >>> Thanks for the feedback. As there are no objections, I've opened a
>> JIRA
>> > >>> issue(FLINK-15847[1]) to address this issue.
>> > >>> The implementation details can be discussed in the issue or in the
>> > >>> following PR.
>> > >>>
>> > >>> Best,
>> > >>> Hequn
>> > >>>
>> > >>> [1] https://issues.apache.org/jira/browse/FLINK-15847
>> > >>>
>> > >>> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng <chenghe...@gmail.com>
>> > wrote:
>> > >>>
>> > >>> > Hi Jincheng,
>> > >>> >
>> > >>> > Thanks a lot for your feedback!
>> > >>> > Yes, I agree with you. There are cases that multi jars need to be
>> > >>> > uploaded. I will prepare another discussion later. Maybe with a
>> > simple
>> > >>> > design doc.
>> > >>> >
>> > >>> > Best, Hequn
>> > >>> >
>> > >>> > On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <
>> > sunjincheng...@gmail.com>
>> > >>> > wrote:
>> > >>> >
>> > >>> >> Thanks for bring up this discussion Hequn!
>> > >>> >>
>> > >>> >> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
>> > >>> >>
>> > >>> >> BTW: I think would be great if bring up a discussion for upload
>> > >>> multiple
>> > >>> >> Jars at the same time. as PyFlink JOB also can have the benefit
>> if
>> > we
>> > >>> do
>> > >>> >> that improvement.
>> > >>> >>
>> > >>> >> Best,
>> > >>> >> Jincheng
>> > >>> >>
>> > >>> >>
>> > >>> >> Hequn Cheng <chenghe...@gmail.com> 于2020年1月8日周三 上午11:50写道:
>> > >>> >>
>> > >>> >> > Hi everyone,
>> > >>> >> >
>> > >>> >> > FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI which
>> > moves
>> > >>> >> Flink
>> > >>> >> > ML a step further. Base on it, users can develop their ML jobs
>> and
>> > >>> more
>> > >>> >> and
>> > >>> >> > more machine learning platforms are providing ML services.
>> > >>> >> >
>> > >>> >> > However, the problem now is the jars of flink-ml-api and
>> > >>> flink-ml-lib
>> > >>> >> are
>> > >>> >> > only exist on maven repo. Whenever users want to submit ML
>> jobs,
>> > >>> they
>> > >>> >> can
>> > >>> >> > only depend on the ml modules and package a fat jar. This
>> would be
>> > >>> >> > inconvenient especially for the machine learning platforms on
>> > which
>> > >>> >> nearly
>> > >>> >> > all jobs depend on Flink ML modules and have to package a fat
>> jar.
>> > >>> >> >
>> > >>> >> > Given this, it would be better to include jars of flink-ml-api
>> and
>> > >>> >> > flink-ml-lib in the `opt` folder, so that users can directly
>> use
>> > the
>> > >>> >> jars
>> > >>> >> > with the binary release. For example, users can move the jars
>> into
>> > >>> the
>> > >>> >> > `lib` folder or use -j to upload the jars. (Currently, -j only
>> > >>> support
>> > >>> >> > upload one jar. Supporting multi jars for -j can be discussed
>> in
>> > >>> another
>> > >>> >> > discussion.)
>> > >>> >> >
>> > >>> >> > Putting the jars in the `opt` folder instead of the `lib`
>> folder
>> > is
>> > >>> >> because
>> > >>> >> > currently, the ml jars are still optional for the Flink
>> project by
>> > >>> >> default.
>> > >>> >> >
>> > >>> >> > What do you think? Welcome any feedback!
>> > >>> >> >
>> > >>> >> > Best,
>> > >>> >> >
>> > >>> >> > Hequn
>> > >>> >> >
>> > >>> >> > [1]
>> > >>> >> >
>> > >>> >> >
>> > >>> >>
>> > >>>
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
>> > >>> >> >
>> > >>> >>
>> > >>> >
>> > >>>
>> > >>
>> >
>>
>

Reply via email to