Re: [DISCUSS] FLIP-173: Support DAG of algorithms (Flink ML)

Becket Qin Tue, 10 Aug 2021 17:49:26 -0700

Thanks for the feedback, Mingliang.

Dong, I think what Mingliang meant by option-2 is the second way mentioned
in my email, i.e. having a Graph encapsulation. It does not mean the option
2 in the FLIP. So he actually meant option 1 of the FLIP. Mingliang can
correct me if I misunderstood.

Hi Timo,

Thanks for taking a look at the FLIP and giving the feedback.

Having named output tables could be helpful. It would make the code more
readable. That said, we might want to keep both index-based retrieval and
name-based retrieval. This is because the usefulness of index and named
tables may depend on the number of outputs we have. For example, if most of
the Transformer / Estimator only has one output, indexes are probably more
concise. Asking users always get the output by name could be a little
verbose, plus users have to also first find out the name of the output. On
the other hand, in some other cases, if a stage has a lot of output tables,
named output would help.

Another thing is that users can always assign an output table to a
variable, which is equivalent to the named output except the name is user
defined. For example,
   Table transformerOutput1 = transformer1.transform(intput1)[0];
   Table transformerOutput2 = transformer2.transform(input2[0],
transformerOutput1)[0];
   Table transformerOutput3 = transformer3.transform(transformerOutput2)[0];
   Table transformerOutput4 = transformer3.transform(transformerOutput3)[0];
   Table transformerOutput5 = transformer3.transform(transformerOutput4)[0];
   Table transformerOutput6 = transformer3.transform(transformerOutput5)[0];
   Table output = transformer7.transform(transformerOutput6);

Does this provide a similar experience as the named output to the users?

Thanks,

Jiangjie (Becket) Qin

On Tue, Aug 10, 2021 at 4:04 PM Timo Walther <twal...@apache.org> wrote:

> Hi everyone,
>
> I'm not deeply involved in the discussion but I quickly checked out the
> proposed interfaces because it seems they are using Table API heavily
> and would like to leave some feedback here:
>
> I have the feeling that the proposed interfaces are a bit too simplified.
>
> Methods like `Table[] transform(Table... inputs)` are very difficult to
> handle because they involve a lot of array index magic for implementers
> and users. Also the examples are hard to read because of all the index
> arithmetic going on:
>
>
> Table output =
>    transformer7.transform(
>    transformer6.transform(
>    transformer5.transform(
>    transformer4.transform(
>    tranformers3.transform(
>      transformer2.transform(input2)[0], transformer1.transform(input1)[0]
>    )[0])[0])[0])[0])[0])[0];
>
>
>
> Table[] compute(Table... inputs) {
>          Table output1 = new AOp(...).compute(inputs[0])[0];
>          Table output2 = new AOp(...).compute(inputs[1])[0];
>          return new BTrainOp(...).compute(output1, output2);
>      }
>
>
> Especially for larger pipelines, it will be difficult to distinguish
> between main output, statistics and other side outputs.
>
> Wouldn't it be better to introduce a new concept (maybe even on Table
> API level), to express a modular API operator that takes and returns
> multiple tables. Ideally, those parameters and results would be named
> and/or tagged such that the following operator can easily distinguish
> the different result tables and pick what is needed.
>
> That would make the interfaces a bit more complicated but help
> standardizing the communication between modular operators.
>
> Of course this would need a separate design discussion, but also non-ML
> users in Table API could benefit from.
>
> Regards,
> Timo
>
>
> On 10.08.21 07:28, Dong Lin wrote:
> > Thank you Mingliang for providing the comments.
> >
> > Currently option-1 proposes Graph/GraphModel/GraphBuilder to build an
> > Estimator from a graph of Estimator/Transformer, where Estimator could
> > generate the model (as a Transformer) directly. On the other hand,
> option-2
> > proposes AlgoOperator that can be linked into a graph of AlgoOperator.
> >
> > It seems that option-1 is closer to what TF does than option-2. Could you
> > double check whether you mean option-1 or option-2?
> >
> >
> >
> >
> > On Tue, Aug 10, 2021 at 11:29 AM 青雉（祁明良） <m...@xiaohongshu.com> wrote:
> >
> >> Vote for option 2.
> >> It is similar to what we are doing with Tensorflow.
> >> 1. Define the graph in training phase
> >> 2. Export model with different input/output spec for online inference
> >>
> >> Thanks,
> >> Mingliang
> >>
> >> On Aug 10, 2021, at 9:39 AM, Becket Qin <becket....@gmail.com<mailto:
> >> becket....@gmail.com>> wrote:
> >>
> >> estimatorInputs
> >>
> >>
> >>
> >>
> 本?件及其附件含有小??公司的保密信息，?限于?送?以上收件人或群?。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、?制、或散?）本?件中的信息。如果??收了本?件，??立即??或?件通知?件人并?除本?件！
> >> This communication may contain privileged or other confidential
> >> information of Red. If you have received it in error, please advise the
> >> sender by reply e-mail and immediately delete the message and any
> >> attachments without copying or disclosing the contents. Thank you.
> >>
>
>

Re: [DISCUSS] FLIP-173: Support DAG of algorithms (Flink ML)

Reply via email to