Hi, Mingliang,

Thank you for providing a real-world case of heterogeneous topology in the 
training and inference phase, and Becket has given two options to you to choose.

Personally, I think Becket's two options are over-simplified in description, 
and may be somehow misleading. 
Here, I would like add some of my thoughts:
Proposal Option-2 does NOT have to implement two DAGs in ALL cases.
In most cases, the best practice in Proposal Option-2 is to put the common part 
(inference part) into a Pipeline. In the training phase, the data is 
preprocessed by AlgoOps or another pipeline, and then fed to Pipeline.fit().  
The output PipelineModel can be directly used in the inference phase. The code 
will be much clearer and cleaner than the complicated manipulation of 
estimatorInputs and transformerInput in the Graph API.
Proposal Option-1 can NOT ALWAYS encapsulate the heterogeneous topology with 
the Graph/GraphBuilder API.  In [1], we already list some cases where Graph API 
failed to encapsulate the complicated topology, and we also presented concrete 
scenarios we encountered. And such incapability could bring extra effort when 
incremental developing your ML task.
EVEN IF Mingliang's cases happened to be in the rare positions where Becket's 
two options applied, In [1], the actual differences between two options are 
shown in code snippets. I personally do not think implementing two DAGs brings 
much overhead. You may check those code snippets if you would like.

As far as I can see, most inference/predict pipelines are used for online 
serving (as in offline inference, there is no need to export models). In the 
situation of online serving, the corresponding pipeline can only accept 1 
dataset and produce 1 dataset. It means the item-1 above applies: Proposal 
Option-2 does the same thing in a clear and clean way.

So, Mingliang, if it does not bother you much, you may give more information 
about your scenarios, and may think with supplementary information above.

[1] 
https://docs.google.com/document/d/1L3aI9LjkcUPoM52liEY6uFktMnFMNFQ6kXAjnz_11do

Sincerely,
Fan Hong



------------------------------------------------------------------
发件人:青雉(祁明良) <m...@xiaohongshu.com>
发送时间:2021年8月10日(星期二) 11:36
收件人:dev@flink.apache.org <dev@flink.apache.org>
主 题:Re: [DISCUSS] FLIP-173: Support DAG of algorithms (Flink ML)

Vote for option 2.
It is similar to what we are doing with Tensorflow.
1. Define the graph in training phase
2. Export model with different input/output spec for online inference

Thanks,
Mingliang

On Aug 10, 2021, at 9:39 AM, Becket Qin 
<becket....@gmail.com<mailto:becket....@gmail.com>> wrote:

estimatorInputs


本?件及其附件含有小??公司的保密信息,?限于?送?以上收件人或群?。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、?制、或散?)本?件中的信息。如果??收了本?件,??立即??或?件通知?件人并?除本?件!
This communication may contain privileged or other confidential information of 
Red. If you have received it in error, please advise the sender by reply e-mail 
and immediately delete the message and any attachments without copying or 
disclosing the contents. Thank you.

Reply via email to