I would not object given that it is rather small at the moment. However, I
also think that we should have a plan how to handle the ever growing Flink
ecosystem and how to make it easily accessible to our users. E.g. one far
fetched idea could be something like a configuration script which downloads
the required components for the user. But this deserves definitely a
separate discussion and does not really belong here.

Cheers,
Till

On Thu, Feb 6, 2020 at 3:35 PM Hequn Cheng <he...@apache.org> wrote:

>
> Hi everyone,
>
> Thank you all for the great inputs!
>
> I think probably what we all agree on is we should try to make a leaner
> flink-dist. However, we may also need to do some compromises considering
> the user experience that users don't need to download the dependencies from
> different places. Otherwise, we can move all the jars in the current opt
> folder to the download page.
>
> The missing of clear rules for guiding such compromises makes things more
> complicated now. I would agree that the decisive factor for what goes into
> Flink's binary distribution should be how core it is to Flink. Meanwhile,
> it's better to treat Flink API as a (core) core to Flink. Not only it is a
> very clear rule that easy to be followed but also in most cases, API is
> very significant and deserved to be included in the dist.
>
> Given this, it might make sense to put flink-ml-api and flink-ml-lib into
> the opt.
> What do you think?
>
> Best,
> Hequn
>
> On Wed, Feb 5, 2020 at 12:39 AM Chesnay Schepler <ches...@apache.org>
> wrote:
>
>> Around a year ago I started a discussion
>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/DISCUSS-Towards-a-leaner-flink-dist-tp25615.html>
>> on reducing the amount of jars we ship with the distribution.
>>
>> While there was no definitive conclusion there was a shared sentiment
>> that APIs should be shipped with the distribution.
>>
>> On 04/02/2020 17:25, Till Rohrmann wrote:
>>
>> I think there is no such rule that APIs go automatically into opt/ and
>> "libraries" not. The contents of opt/ have mainly grown over time w/o
>> following a strict rule.
>>
>> I think the decisive factor for what goes into Flink's binary distribution
>> should be how core it is to Flink. Of course another important
>> consideration is which use cases Flink should promote "out of the box" (not
>> sure whether this is actual true for content shipped in opt/ because you
>> also have to move it to lib).
>>
>> For example, Gelly would be an example which I would rather see as an
>> optional component than shipping it with every Flink binary distribution.
>>
>> Cheers,
>> Till
>>
>> On Tue, Feb 4, 2020 at 11:24 AM Becket Qin <becket....@gmail.com> 
>> <becket....@gmail.com> wrote:
>>
>>
>> Thanks for the suggestion, Till.
>>
>> I am curious about how do we usually decide when to put the jars into the
>> opt folder?
>>
>> Technically speaking, it seems that `flink-ml-api` should be put into the
>> opt directory because they are actually API instead of libraries, just like
>> CEP and Table.
>>
>> `flink-ml-lib` seems to be on the border. On one hand, it is a library. On
>> the other hand, unlike SQL formats and Hadoop whose major code are outside
>> of Flink, the algorithm codes are in Flink. So `flink-ml-lib` is more like
>> those of built-in SQL UDFs. So it seems fine to either put it in the opt
>> folder or in the downloads page.
>>
>> From the user experience perspective, it might be better to have both
>> `flink-ml-lib` and `flink-ml-api` in opt folder so users needn't go to two
>> places for the required dependencies.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On Tue, Feb 4, 2020 at 2:32 PM Hequn Cheng <he...@apache.org> 
>> <he...@apache.org> wrote:
>>
>>
>> Hi Till,
>>
>> Thanks a lot for your suggestion. It's a good idea to offer the flink-ml
>> libraries as optional dependencies on the download page which can make
>>
>> the
>>
>> dist smaller.
>>
>> But I also have some concerns for it, e.g., the download page now only
>> includes the latest 3 releases. We may need to find ways to support more
>> versions.
>> On the other hand, the size of the flink-ml libraries now is very
>> small(about 246K), so it would not bring much impact on the size of dist.
>>
>> What do you think?
>>
>> Best,
>> Hequn
>>
>> On Mon, Feb 3, 2020 at 6:24 PM Till Rohrmann <trohrm...@apache.org> 
>> <trohrm...@apache.org>
>>
>> wrote:
>>
>> An alternative solution would be to offer the flink-ml libraries as
>> optional dependencies on the download page. Similar to how we offer the
>> different SQL formats and Hadoop releases [1].
>>
>> [1] https://flink.apache.org/downloads.html
>>
>> Cheers,
>> Till
>>
>> On Mon, Feb 3, 2020 at 10:19 AM Hequn Cheng <he...@apache.org> 
>> <he...@apache.org> wrote:
>>
>>
>> Thank you all for your feedback and suggestions!
>>
>> Best, Hequn
>>
>> On Mon, Feb 3, 2020 at 5:07 PM Becket Qin <becket....@gmail.com> 
>> <becket....@gmail.com>
>>
>> wrote:
>>
>> Thanks for bringing up the discussion, Hequn.
>>
>> +1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This would
>>
>> make
>>
>> it much easier for the users to try out some simple ml tasks.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On Mon, Feb 3, 2020 at 4:34 PM jincheng sun <
>>
>> sunjincheng...@gmail.com
>>
>> wrote:
>>
>>
>> Thank you for pushing forward @Hequn Cheng <he...@apache.org> 
>> <he...@apache.org> !
>>
>> Hi  @Becket Qin <becket....@gmail.com> <becket....@gmail.com> , Do you have 
>> any concerns
>>
>> on
>>
>> this ?
>>
>> Best,
>> Jincheng
>>
>> Hequn Cheng <he...@apache.org> <he...@apache.org> 于2020年2月3日周一 下午2:09写道:
>>
>>
>> Hi everyone,
>>
>> Thanks for the feedback. As there are no objections, I've opened a
>>
>> JIRA
>>
>> issue(FLINK-15847[1]) to address this issue.
>> The implementation details can be discussed in the issue or in the
>> following PR.
>>
>> Best,
>> Hequn
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-15847
>>
>> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng <chenghe...@gmail.com> 
>> <chenghe...@gmail.com>
>>
>> wrote:
>>
>> Hi Jincheng,
>>
>> Thanks a lot for your feedback!
>> Yes, I agree with you. There are cases that multi jars need to
>>
>> be
>>
>> uploaded. I will prepare another discussion later. Maybe with a
>>
>> simple
>>
>> design doc.
>>
>> Best, Hequn
>>
>> On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <
>>
>> sunjincheng...@gmail.com>
>>
>> wrote:
>>
>>
>> Thanks for bring up this discussion Hequn!
>>
>> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
>>
>> BTW: I think would be great if bring up a discussion for upload
>>
>> multiple
>>
>> Jars at the same time. as PyFlink JOB also can have the benefit
>>
>> if
>>
>> we
>>
>> do
>>
>> that improvement.
>>
>> Best,
>> Jincheng
>>
>>
>> Hequn Cheng <chenghe...@gmail.com> <chenghe...@gmail.com> 于2020年1月8日周三 
>> 上午11:50写道:
>>
>>
>> Hi everyone,
>>
>> FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI
>>
>> which
>>
>> moves
>>
>> Flink
>>
>> ML a step further. Base on it, users can develop their ML
>>
>> jobs
>>
>> and
>>
>> more
>>
>> and
>>
>> more machine learning platforms are providing ML services.
>>
>> However, the problem now is the jars of flink-ml-api and
>>
>> flink-ml-lib
>>
>> are
>>
>> only exist on maven repo. Whenever users want to submit ML
>>
>> jobs,
>>
>> they
>>
>> can
>>
>> only depend on the ml modules and package a fat jar. This
>>
>> would be
>>
>> inconvenient especially for the machine learning platforms on
>>
>> which
>>
>> nearly
>>
>> all jobs depend on Flink ML modules and have to package a fat
>>
>> jar.
>>
>> Given this, it would be better to include jars of
>>
>> flink-ml-api
>>
>> and
>>
>> flink-ml-lib in the `opt` folder, so that users can directly
>>
>> use
>>
>> the
>>
>> jars
>>
>> with the binary release. For example, users can move the jars
>>
>> into
>>
>> the
>>
>> `lib` folder or use -j to upload the jars. (Currently, -j
>>
>> only
>>
>> support
>>
>> upload one jar. Supporting multi jars for -j can be discussed
>>
>> in
>>
>> another
>>
>> discussion.)
>>
>> Putting the jars in the `opt` folder instead of the `lib`
>>
>> folder
>>
>> is
>>
>> because
>>
>> currently, the ml jars are still optional for the Flink
>>
>> project by
>>
>> default.
>>
>> What do you think? Welcome any feedback!
>>
>> Best,
>>
>> Hequn
>>
>> [1]
>>
>>
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
>>
>>
>>

Reply via email to