Re: [DISCUSS] Embracing Table API in Flink ML

Weihua Jiang Tue, 20 Nov 2018 18:14:01 -0800

Hi Yun,

Can't wait to see your design.


Thanks
Weihua

Yun Gao <yungao...@aliyun.com.invalid> 于2018年11月21日周三 上午12:43写道：

> Hi Weihua,
>
>     Thanks for the exciting proposal!
>
>     I have quickly read through it,  and I really appropriate the idea of
> providing the ML Pipeline API similar to the commonly used library
> scikit-learn, since it greatly reduce the learning cost for the AI
> engineers to transfer to the Flink platform.
>
>     Currently we are also working on a related issue, namely enhancing the
> stream iteration of Flink to support both SGD and online learning, and it
> also support batch training as a special case. we have had a rough design
> and will start a new discussion in the next few days. I think the enhanced
> stream iteration will help to implement Estimators directly in Flink, and
> it may help to simplify the online learning pipeline by eliminating the
> requirement to load the models from external file systems.
>
>     I will read the design doc more carefully. Thanks again for sharing
> the design doc!
>
> Yours sincerely
>     Yun Gao
>
>
> ------------------------------------------------------------------
> 发件人：Weihua Jiang <weihua.ji...@gmail.com>
> 发送时间：2018年11月20日(星期二) 20:53
> 收件人：dev <dev@flink.apache.org>
> 主 题：[DISCUSS] Embracing Table API in Flink ML
>
> ML Pipeline is the idea brought by Scikit-learn
> <https://arxiv.org/abs/1309.0238>. Both Spark and Flink has borrowed this
> idea and made their own implementations [Spark ML Pipeline
> <https://spark.apache.org/docs/latest/ml-pipeline.html>, Flink ML Pipeline
> <
> https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/libs/ml/pipelines.html
> >].
>
>
>
> NOTE: though I am using the term "ML", ML Pipeline shall apply to both ML
> and DL pipelines.
>
>
> ML Pipeline is quite helpful for model composition (i.e. using model(s) for
> feature engineering) . And it enables logic reuse in train and inference
> phases (via pipeline persistence and load), which is essential for AI
> engineering. ML Pipeline can also be a good base for Flink based AI
> engineering platform if we can make ML Pipeline have good tooling support
> (i.e. meta data human readable).
>
>
> As the Table API will be the unified high level API for both stream and
> batch processing, I want to initiate the design discussion of new Table
> based Flink ML Pipeline.
>
>
> I drafted a design document [1] for this discussion. This design tries to
> create a new ML Pipeline implementation so that concrete ML/DL algorithms
> can fit to this new API to achieve interoperability.
>
>
> Any feedback is highly appreciated.
>
>
> Thanks
>
> Weihua
>
>
> [1]
>
> https://docs.google.com/document/d/1PLddLEMP_wn4xHwi6069f3vZL7LzkaP0MN9nAB63X90/edit?usp=sharing
>
>

Re: [DISCUSS] Embracing Table API in Flink ML

Reply via email to