Re: [DISCUSS] FLIP-173: Support DAG of algorithms (Flink ML)

Timo Walther Tue, 10 Aug 2021 01:04:31 -0700

Hi everyone,

I'm not deeply involved in the discussion but I quickly checked out theproposed interfaces because it seems they are using Table API heavilyand would like to leave some feedback here:


I have the feeling that the proposed interfaces are a bit too simplified.

Methods like `Table[] transform(Table... inputs)` are very difficult tohandle because they involve a lot of array index magic for implementersand users. Also the examples are hard to read because of all the indexarithmetic going on:



Table output =
  transformer7.transform(
  transformer6.transform(
  transformer5.transform(
  transformer4.transform(
  tranformers3.transform(
    transformer2.transform(input2)[0], transformer1.transform(input1)[0]
  )[0])[0])[0])[0])[0])[0];



Table[] compute(Table... inputs) {
        Table output1 = new AOp(...).compute(inputs[0])[0];
        Table output2 = new AOp(...).compute(inputs[1])[0];
        return new BTrainOp(...).compute(output1, output2);
    }

Especially for larger pipelines, it will be difficult to distinguishbetween main output, statistics and other side outputs.

Wouldn't it be better to introduce a new concept (maybe even on TableAPI level), to express a modular API operator that takes and returnsmultiple tables. Ideally, those parameters and results would be namedand/or tagged such that the following operator can easily distinguishthe different result tables and pick what is needed.

That would make the interfaces a bit more complicated but helpstandardizing the communication between modular operators.

Of course this would need a separate design discussion, but also non-MLusers in Table API could benefit from.


Regards,
Timo


On 10.08.21 07:28, Dong Lin wrote:

Thank you Mingliang for providing the comments.

Currently option-1 proposes Graph/GraphModel/GraphBuilder to build an
Estimator from a graph of Estimator/Transformer, where Estimator could
generate the model (as a Transformer) directly. On the other hand, option-2
proposes AlgoOperator that can be linked into a graph of AlgoOperator.

It seems that option-1 is closer to what TF does than option-2. Could you
double check whether you mean option-1 or option-2?




On Tue, Aug 10, 2021 at 11:29 AM 青雉（祁明良） <m...@xiaohongshu.com> wrote:

Vote for option 2.
It is similar to what we are doing with Tensorflow.
1. Define the graph in training phase
2. Export model with different input/output spec for online inference

Thanks,
Mingliang

On Aug 10, 2021, at 9:39 AM, Becket Qin <becket....@gmail.com<mailto:
becket....@gmail.com>> wrote:

estimatorInputs



本?件及其附件含有小??公司的保密信息，?限于?送?以上收件人或群?。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、?制、或散?）本?件中的信息。如果??收了本?件，??立即??或?件通知?件人并?除本?件！
This communication may contain privileged or other confidential
information of Red. If you have received it in error, please advise the
sender by reply e-mail and immediately delete the message and any
attachments without copying or disclosing the contents. Thank you.

Re: [DISCUSS] FLIP-173: Support DAG of algorithms (Flink ML)

Reply via email to