Hi Weihua, Thanks for the well written design doc!
The abstraction of ML pipeline is pretty handy to the AI engineers. As Jincheng mentioned, there is an undergoing effort to enhance the Table API for ML. But it would still be helpful to understand what is missing in Table API to fully support the ML pipeline. Given that there are quite a few proposed API and different related items to discuss, do you think having some examples of how the pipeline works would facilitate the discussion? Again, thanks for kicking off the discussion. Jiangjie (Becket) Qin On Tue, Nov 20, 2018 at 9:17 PM jincheng sun <sunjincheng...@gmail.com> wrote: > Hi Weihua, > Thanks for bring up this discuss! > > I quickly read the google doc,and I fully agree that ML can be well > supported on TableAPI (at some stage in the future). > In fact, Xiaowei and I have already brought up a discussion on enhancing > the Table API. In the first phase, we will add support for > map/flatmap/agg/flatagg in TableAPI. > So I am very happy to be involved in this discussion and will leave a > comment in the good doc later. > > I think It's grateful if you can add a phased implementation plan in google > doc. What to do you think? > > Thanks, > Jincheng > > > Weihua Jiang <weihua.ji...@gmail.com> 于2018年11月20日周二 下午8:53写道: > > > ML Pipeline is the idea brought by Scikit-learn > > <https://arxiv.org/abs/1309.0238>. Both Spark and Flink has borrowed > this > > idea and made their own implementations [Spark ML Pipeline > > <https://spark.apache.org/docs/latest/ml-pipeline.html>, Flink ML > Pipeline > > < > > > https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/libs/ml/pipelines.html > > >]. > > > > > > > > NOTE: though I am using the term "ML", ML Pipeline shall apply to both ML > > and DL pipelines. > > > > > > ML Pipeline is quite helpful for model composition (i.e. using model(s) > for > > feature engineering) . And it enables logic reuse in train and inference > > phases (via pipeline persistence and load), which is essential for AI > > engineering. ML Pipeline can also be a good base for Flink based AI > > engineering platform if we can make ML Pipeline have good tooling support > > (i.e. meta data human readable). > > > > > > As the Table API will be the unified high level API for both stream and > > batch processing, I want to initiate the design discussion of new Table > > based Flink ML Pipeline. > > > > > > I drafted a design document [1] for this discussion. This design tries to > > create a new ML Pipeline implementation so that concrete ML/DL algorithms > > can fit to this new API to achieve interoperability. > > > > > > Any feedback is highly appreciated. > > > > > > Thanks > > > > Weihua > > > > > > [1] > > > > > https://docs.google.com/document/d/1PLddLEMP_wn4xHwi6069f3vZL7LzkaP0MN9nAB63X90/edit?usp=sharing > > >