Hi, Mingliang, Thank you for providing a real-world case of heterogeneous topology in the training and inference phase, and Becket has given two options to you to choose.
Personally, I think Becket's two options are over-simplified in description, and may be somehow misleading. Here, I would like add some of my thoughts: Proposal Option-2 does NOT have to implement two DAGs in ALL cases. In most cases, the best practice in Proposal Option-2 is to put the common part (inference part) into a Pipeline. In the training phase, the data is preprocessed by AlgoOps or another pipeline, and then fed to Pipeline.fit(). The output PipelineModel can be directly used in the inference phase. The code will be much clearer and cleaner than the complicated manipulation of estimatorInputs and transformerInput in the Graph API. Proposal Option-1 can NOT ALWAYS encapsulate the heterogeneous topology with the Graph/GraphBuilder API. In [1], we already list some cases where Graph API failed to encapsulate the complicated topology, and we also presented concrete scenarios we encountered. And such incapability could bring extra effort when incremental developing your ML task. EVEN IF Mingliang's cases happened to be in the rare positions where Becket's two options applied, In [1], the actual differences between two options are shown in code snippets. I personally do not think implementing two DAGs brings much overhead. You may check those code snippets if you would like. As far as I can see, most inference/predict pipelines are used for online serving (as in offline inference, there is no need to export models). In the situation of online serving, the corresponding pipeline can only accept 1 dataset and produce 1 dataset. It means the item-1 above applies: Proposal Option-2 does the same thing in a clear and clean way. So, Mingliang, if it does not bother you much, you may give more information about your scenarios, and may think with supplementary information above. [1] https://docs.google.com/document/d/1L3aI9LjkcUPoM52liEY6uFktMnFMNFQ6kXAjnz_11do Sincerely, Fan Hong ------------------------------------------------------------------ 发件人:青雉(祁明良) <m...@xiaohongshu.com> 发送时间:2021年8月10日(星期二) 11:36 收件人:dev@flink.apache.org <dev@flink.apache.org> 主 题:Re: [DISCUSS] FLIP-173: Support DAG of algorithms (Flink ML) Vote for option 2. It is similar to what we are doing with Tensorflow. 1. Define the graph in training phase 2. Export model with different input/output spec for online inference Thanks, Mingliang On Aug 10, 2021, at 9:39 AM, Becket Qin <becket....@gmail.com<mailto:becket....@gmail.com>> wrote: estimatorInputs 本?件及其附件含有小??公司的保密信息,?限于?送?以上收件人或群?。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、?制、或散?)本?件中的信息。如果??收了本?件,??立即??或?件通知?件人并?除本?件! This communication may contain privileged or other confidential information of Red. If you have received it in error, please advise the sender by reply e-mail and immediately delete the message and any attachments without copying or disclosing the contents. Thank you.