ML Pipeline is the idea brought by Scikit-learn <https://arxiv.org/abs/1309.0238>. Both Spark and Flink has borrowed this idea and made their own implementations [Spark ML Pipeline <https://spark.apache.org/docs/latest/ml-pipeline.html>, Flink ML Pipeline <https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/libs/ml/pipelines.html>].
NOTE: though I am using the term "ML", ML Pipeline shall apply to both ML and DL pipelines. ML Pipeline is quite helpful for model composition (i.e. using model(s) for feature engineering) . And it enables logic reuse in train and inference phases (via pipeline persistence and load), which is essential for AI engineering. ML Pipeline can also be a good base for Flink based AI engineering platform if we can make ML Pipeline have good tooling support (i.e. meta data human readable). As the Table API will be the unified high level API for both stream and batch processing, I want to initiate the design discussion of new Table based Flink ML Pipeline. I drafted a design document [1] for this discussion. This design tries to create a new ML Pipeline implementation so that concrete ML/DL algorithms can fit to this new API to achieve interoperability. Any feedback is highly appreciated. Thanks Weihua [1] https://docs.google.com/document/d/1PLddLEMP_wn4xHwi6069f3vZL7LzkaP0MN9nAB63X90/edit?usp=sharing