Hi everyone, Just FYI, if there is no further suggestion on FLIP, we plan to start the voting thread this Friday on 9/10.
Thanks, Dong On Fri, Aug 27, 2021 at 10:32 AM Zhipeng Zhang <zhangzhipe...@gmail.com> wrote: > Thanks for the post, Dong :) > > We welcome everyone to drop us an email on Flink ML. Let's work together > to build machine learning on Flink :) > > > Dong Lin <lindon...@gmail.com> 于2021年8月25日周三 下午8:58写道: > >> Hi everyone, >> >> Based on the feedback received in the online/offline discussion in the >> past few weeks, we (Zhepeng, Fan, myself and a few other developers at >> Alibaba) have reached agreement on the design to support DAG of algorithms. >> We have merged the ideas from the intial two options into this FLIP-176 >> <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=184615783> >> design >> doc. >> >> If you have comments on the latest design doc, please let us know! >> >> Cheers, >> Dong >> >> >> On Mon, Aug 23, 2021 at 5:07 PM Becket Qin <becket....@gmail.com> wrote: >> >>> Thanks for the comments, Fan. Please see the reply inline. >>> >>> On Thu, Aug 19, 2021 at 10:25 PM Fan Hong <hongfa...@gmail.com> wrote: >>> >>> > Hi, Becket, >>> > >>> > Many thanks to your detailed review. I agree that it is easier to >>> involve >>> > more people to discuss if fundamental differences are highlighted. >>> > >>> > >>> > Here are some of my thoughts to help other people to think about these >>> > differences. (correct me if those technique details are not right.) >>> > >>> > >>> > 1. One set of API or not? May be not that important. >>> > >>> > >>> > First of all, AlgoOperators and Pipeline / Transformer / Estimator in >>> > Proposal 2 are absolutely *NOT* independent. >>> > >>> > >>> > One may think they are independent, because they see Pipeline / >>> > Transformer / Estimator are already in Flink ML Lib and AlgoOperators >>> are >>> > lately added in this proposal. But that's not true. If you check >>> Alink[1] >>> > where the idea of Proposal 2 originated, both of them have been >>> presented >>> > long ago, and they collaborate tightly. >>> > >>> >>> > In the aspects of functionalities, they are also not independent. Their >>> > relation is more like a two-level API to specify ML tasks: >>> AlgoOperators is >>> > a general-purpose level to represent any ML algorithms, while Pipeline >>> / >>> > Transformer / Estimator provides a higher-level API which enables >>> wrapping >>> > multiple ML algorithms together in a fit-transform way. >>> > >>> >>> We probably need to first clarify what "independent" means here. Sure, >>> users can always wrap the Transformer into an AlgoOperator, but users can >>> basically wrap any code, any class into an AlgoOperator. And we wouldn't >>> say AlgoOperator is not independent of any class, right? In my opinion, >>> the >>> two APIs are independent because even if we agree that Transformers are >>> doing things that are conceptually a subset of what AlgoOperators do, a >>> Transformer cannot be used as an AlgoOperator out of the box without >>> wrapping. And even worse, a MIMO AlgoOperator cannot be wrapped into a >>> Transformer / Estimator if these two APIs are SISO. So from what I see, >>> in >>> Option 2, these two APIs are independent from API design perspective. >>> >>> One could consider Flink DataStream - Table as an analogy to >>> AlgoOperators >>> > - Pipeline. The two-level APIs provides different functionalities to >>> end >>> > users, and the higher-level API will call lower-level of API in >>> internal >>> > implementation. I'm not saying the two-level API design in Proposal 2 >>> is >>> > good because Flink already did this. I just hope to help community >>> people >>> > to understand the relation between AlgoOperators and Pipeline. >>> > >>> >>> I am not sure if it is accurate to say DataStream is a low-level API of >>> Table. They are simply two different DSL, one for relational / SQL-like >>> analytics paradigm, and the other for those who are more familiar with >>> streaming applications. More importantly, they are designed to >>> support conversion from one to the other out of the box, which is unlike >>> Pipeline and AlgoOperators in proposal 2. >>> >>> >>> > An additional usage and benefit of Pipeline API is that SISO >>> PipelineModel >>> > corresponds to a deployable unit for online serving exactly. >>> > >>> > In online serving, Flink runtime are usually avoided to achieve low >>> > latency. So models have to be wrapped for transmission from Flink >>> ecosystem >>> > to a non-Flink one. Here is the place where the wrapping is really >>> needed >>> > and inevitable, because the serving service providers are usually >>> expected >>> > to be general to one type of models. Pipeline API in Proposal 2 target >>> to >>> > this scene exactly without complicated APIs. >>> > >>> > Yet, for offline or nearline inference, they can be completed in Flink >>> > ecosystem. That's where Flink ML Lib still exists, so a loose wrapping >>> > using AlgoOperators in Proposal 2 still works with not much overhead. >>> > >>> >>> It seems that a MIMO transformer can easily support all SISO use cases, >>> right? And there is zero overhead because users may not have to wrap >>> AlgoOperators, but can just build a Pipeline directly by putting either >>> Transformer or AlgoOperators into it, without worrying about whether they >>> are interoperable. >>> >>> >>> > At the same time, these two levels of APIs are not redundant in their >>> > functionalities, they have to collaborate to build ML tasks. >>> > >>> > AlgoOperator API is self-consistent and self-complete in constructing >>> ML >>> > tasks, but if users are seeking to wrap a sequence of subtasks, >>> especially >>> > for online serving, Pipeline / Transformer / Estimator API is >>> inevitable. >>> > On the other side, Pipeline / Transformer / Estimator API lacks >>> > completeness, even for the extended version plus Graph API in Proposal >>> 1 >>> > (last case in [4]), so it cannot replace AlgoOperator API. >>> > >>> > One case of their collaboration lies in my response to Mingliang's >>> > recommendation scenarios, where AlgoOperators + Pipeline can provide >>> > cleaner usage than Graph API. >>> > >>> >>> I think the link/linkFrom API is more like a convenient wrapper of >>> fit/transform/compute. Functionality wise, they are equivalent. The >>> Graph/GraphBuilder API, on the other hand, is an encapsulation design, on >>> top of Estimator/Transformer/AlgoOperators. Without the >>> GraphBuilder/Graph >>> API, users would have to create their own class to encapsulate the code. >>> Just like without the Pipeline API, users would have to create their own >>> class to wrap the pipeline logic. So I don't think we should compare >>> link/linkFrom with Graph/GraphBuilder API because they serve different >>> purposes. Even though both of them need to describe a DAG, GraphBuilder >>> is >>> describing for encapsulation while link/linkFrom is not. >>> >>> >>> > >>> > 2. What is core semantics of Pipeline / Transformer / Estimator? >>> > >>> > >>> > I will not give my answer because I can't. I think it would be >>> difficult >>> > to reach an agreement on this. >>> > >>> > But I did two things, and hope they can provide some hints. >>> > >>> > >>> > One thing is to seek answers from other ML libraries. Scikit-learn and >>> > SparkML are well-known general-purpose ML libraries. >>> > >>> > Spark ML gives the definition of Pipeline / Transformer / Estimator in >>> its >>> > documents. Here I quote as follows [2]: >>> > >>> > >>> > *Transformer* >>> >> <https://spark.apache.org/docs/latest/ml-pipeline.html#transformers>: >>> >> A Transformer is an algorithm which can transform *one* DataFrame into >>> >> *another* DataFrame. E.g., an ML model is a Transformer which >>> transforms >>> >> *a* DataFrame with features into *a* DataFrame with predictions. >>> >> *Estimator* >>> >> <https://spark.apache.org/docs/latest/ml-pipeline.html#estimators>: >>> >> An Estimator is an algorithm which can be fit on *a* DataFrame to >>> >> produce a Transformer. E.g., a learning algorithm is an Estimator >>> which >>> >> trains on a DataFrame and produces a model. >>> >> *Pipeline* >>> >> <https://spark.apache.org/docs/latest/ml-pipeline.html#pipeline>: >>> >> A Pipeline chains multiple Transformers and Estimators together to >>> specify >>> >> an ML workflow. >>> > >>> > >>> > SparkML clearly declare the quantity of inputs and outputs for >>> Estimator >>> > and Transformer API. Scikit-learn does not give clear definition, >>> instead >>> > present its APIs [3]: >>> > >>> > >>> > >>> >> *Estimator:*The base object, implements a fit method to learn from >>> data, >>> >> either: >>> >> estimator = estimator.fit(data, targets) >>> >> or: >>> >> estimator = estimator.fit(data) >>> >> >>> >> *Transformer:*For filtering or modifying the data, in a supervised or >>> >> unsupervised way, implements: >>> >> new_data = transformer.transform(data) >>> >> When fitting and transforming can be performed much more efficiently >>> >> together than separately, implements: >>> >> new_data = transformer.fit_transform(data) >>> > >>> > >>> > In their API signatures, one 1 input and 1 output is defined. >>> > >>> > >>> > Another thing I did is to seek some concepts in Big Data APIs to make >>> > analogies to Pipeline / Transformer / Estimator in ML APIs, so non-ML >>> > developers may have a better understanding about their positions in ML >>> APIs. >>> > >>> > At last, I think 'map' in the MapReduce paradigm may be a fair analogy >>> and >>> > easy to understand for everyone. One may think 'map' as the >>> MapFunction or >>> > FlatMapFunction in Flink or Mapper in Hadoop. As far as I know, no Big >>> Data >>> > APIs trying to extend 'map' to support multiple inputs or outputs and >>> still >>> > keep the original name. In Flink, there exists co-Map or co-FlatMap >>> which >>> > can be considered as extensions, yet they did not use the name 'map' >>> anyway. >>> > >>> > >>> > So, the core semantics of 'map' is conversion from data to data, or >>> from 1 >>> > dataset to another dataset? With either answer, the fact is no one >>> breaks >>> > the usage convention of 'map'. >>> > >>> > >>> > >>> This is an interesting discussion. First of all, I don't think "map" is a >>> good comparison here. This method is always defined in a class >>> representing >>> a data collection. So there is no actual data input to the method at all. >>> The only semantic that makes sense is to operate on the data collection >>> the >>> `map` method was defined on. And the parameter of `map` is the processing >>> logic, which would also be weird to have more than one. >>> >>> Regarding scikit-learn and Spark, every API design has their context, >>> targeted use cases and design goals. I think it is more important to >>> analyze and understand the reason WHY their API looks like that and >>> whether >>> they are good designs, instead of just following WHAT they look like. >>> >>> In my opinion, one primary reason is Spark and scikit-learn Pipeline >>> assumes that all the samples, whether for training or inference, are well >>> prepared. It basically excluded the data preparation step from the >>> Pipeline. Take the recommendation systems as an example, it is quite >>> typical that the samples are generated based on user behaviors stored in >>> different dataset, such as exposures, clicks, and maybe also user >>> profiles >>> stored in the relational databases. So MIMO is a must in this case. >>> Today, >>> the data preparation is out of the scope of scikit-learn and not included >>> in its Pipeline API. People usually uses other ways such as Pandas or >>> Spark >>> DataFrame to prepare the data. >>> >>> In order to discuss whether the MIMO Pipeline makes sense, we need to >>> think >>> whether it is valuable to include the data preparation into the Pipeline >>> as >>> well. Personally I think it is a good extension and I don't see much harm >>> in doing so. Apparently, a MIMO Pipeline API would also support SISO >>> Pipeline, so it is conceptually backwards compatible. For those >>> algorithms >>> that only make sense for SISO, they can keep just as is. The only >>> difference is that instead of just returning an output Table, they >>> return a >>> single-table array. And for those who do need MIMO support, we also make >>> them happy. Therefore it looks like a useful feature with little cost. >>> >>> BTW, I have asked a few AI practitioners, including those from industry >>> and >>> academia. The concept of MIMO Pipeline itself seems well accepted. >>> Somewhat >>> surprisingly, although the concept of Transformer / Estimator is >>> understood >>> by most people I talked to, they are not familiar with what a >>> transformer / >>> estimator should look like. I think this is partly because ML Pipeline >>> is a >>> well known concept without a well agreed API. In fact, even Spark and >>> scikit-learn have quite different designs for Estimator / Transformer / >>> Pipeline when it comes to details. >>> >>> >>> > 3. About potential inconsistent availabilit.y of algorithms >>> > >>> > >>> > Becket has mentioned that developers may be confused by how to >>> implement >>> > the same algorithm in two levels of APIs of Proposal 2. >>> > >>> > If one accept the relation between AlgoOperator API and Pipeline API >>> > described before, then it is not a problem. It is natural that >>> developers >>> > implement their algorithms in AlgoOperators, and call AlgoOperators in >>> > Estimator/Transformers. >>> > >>> > >>> > If not, I propose a rough idea here: >>> > >>> > An abstract class AlgoOpEstimatorImpl is provided as a subclass of >>> > Estimator. It has a method named getTrainOp() which returns the >>> > AlgoOperator where the computation logic resides. Other codes in >>> > AlgoOpEstimatorImpl are fixed. In this way, developers of Flink ML Lib >>> are >>> > asked to implement Estimator by inheriting AlgoOpEstimatorImpl. >>> > >>> > Other solutions are also possible, but may still need some community >>> > convention. >>> > >>> > >>> > I also would like to mention the same issue exists in Proposal 1, as >>> there >>> > are also multiple places where developers can implement algorithms. >>> > >>> > >>> I am not sure I fully understand what "there are also multiple places >>> where >>> developers can implement algorithms" means. It is always the algorithm >>> authors' call in terms of how to implement the interfaces. Implementation >>> wise, it is OK to have an abstract class such as AlgoImpl, the algorithm >>> authors can choose to leverage it or not. But in either case, the end >>> users >>> won't see the implementation class and should only rely on public >>> interfaces such as Estimator / Transformer / AlgoOperator, etc. >>> >>> >>> > >>> >>> > In summary, I think the first and second issue above are >>> > preference-related, and hope my thoughts can give some clues. The third >>> > issue can be considered as a common technique problem in both >>> proposals. We >>> > may work together to seek better solutions. >>> > >>> > >>> > Sincerely, >>> > >>> > Fan Hong. >>> > >>> > >>> > >>> > [1] https://github.com/alibaba/Alink >>> > >>> > [2] https://spark.apache.org/docs/latest/ml-pipeline.html >>> > >>> > [3] https://scikit-learn.org/stable/developers/develop.html >>> > >>> > [4] >>> > >>> https://docs.google.com/document/d/1L3aI9LjkcUPoM52liEY6uFktMnFMNFQ6kXAjnz_11do >>> > >>> > On Tue, Jul 20, 2021 at 11:42 AM Becket Qin <becket....@gmail.com> >>> wrote: >>> > >>> >> Hi Dong, Zhipeng and Fan, >>> >> >>> >> Thanks for the detailed proposals. It is quite a lot of reading! Given >>> >> that we are introducing a lot of stuff here, I find that it might be >>> easier >>> >> for people to discuss if we can list the fundamental differences >>> first. >>> >> From what I understand, the very fundamental difference between the >>> two >>> >> proposals is following: >>> >> >>> >> * In order to support graph structure, do we extend >>> >> Transformer/Estimator, or do we introduce a new set of API? * >>> >> >>> >> Proposal 1 tries to keep one set of API, which is based on >>> >> Transformer/Estimator/Pipeline. More specifically, it does the >>> following: >>> >> - Make Transformer and Estimator multi-input and multi-output >>> (MIMO). >>> >> - Introduce a Graph/GraphModel as counter parts of >>> >> Pipeline/PipelineModel. >>> >> >>> >> Proposal 2 leaves the existing Transformer/Estimator/Pipeline as is. >>> >> Instead, it introduces AlgoOperators to support the graph structure. >>> The >>> >> AlgoOperators are general-purpose graph nodes supporting MIMO. They >>> are >>> >> independent of Pipeline / Transformer / Estimator. >>> >> >>> >> >>> >> My two cents: >>> >> >>> >> I think it is a big advantage to have a single set of API rather than >>> two >>> >> independent sets of API, if possible. But I would suggest we change >>> the >>> >> current proposal 1 a little bit, by learning from proposal 2. >>> >> >>> >> What I like about proposal 1: >>> >> 1. A single set of API, symmetric in Graph/GraphModel and >>> >> Pipeline/PipelineModel. >>> >> 2. Keeping most of the benefits from Transformer/Estimator, including >>> the >>> >> fit-then-transform relation and save/load capability. >>> >> >>> >> However, proposal 1 also introduced some changes that I am not sure >>> about: >>> >> >>> >> 1. The most controversial part of proposal 1 is whether we should >>> extend >>> >> the Transformer/Estimator/Pipeline? In fact, different projects have >>> >> slightly different designs for Transformer/Estimator/Pipeline. So I >>> think >>> >> it is OK to extend it. However, there are some commonly recognized >>> core >>> >> semantics that we ideally want to keep. To me these core semantics >>> are: >>> >> 1. Transformer is a Data -> Data conversion, Estimator deals with >>> Data >>> >> -> Model conversion. >>> >> 2. Estimator.fit() gives a Transformer, and users can just call >>> >> Transformer.transform() to perform inference. >>> >> To me, as long as these core semantics are kept, extension to the API >>> >> seems acceptable. >>> >> >>> >> Proposal 1 extends the semantic of Transformer from Data -> Data >>> >> conversion to generic Table -> Table conversion, and claims it is >>> >> equivalent to "AlgoOperator" in proposal 2 as a general-purpose graph >>> node. >>> >> It does change the first semantic. That said, this might just be a >>> naming >>> >> problem, though. One possible solution to this problem is having a new >>> >> subclass of Stage without strong conventional semantics, e.g. >>> "AlgoOp" if >>> >> we borrow the name from proposal 2, and let Transformer extend it. >>> Just >>> >> like a PipelineModel is a more specific Transformer, a Transformer >>> would be >>> >> a more specific "AlgoOp". If we do that, the processing logic that >>> people >>> >> don't feel comfortable to be a Transformer can just be put into an >>> "AlgoOp" >>> >> and thus can still be added to a Pipeline / Graph. This borrows the >>> >> advantage of proposal 2. In another word, this essentially makes the >>> >> "AlgoOp" equivalent of "AlgoOperator" in proposal 2, but allows it to >>> be >>> >> added to the Graph and Pipeline if people want to. >>> >> >>> >> This also gives my thoughts regarding the concern that making the >>> >> Transformer/Estimator to MIMO would break the convention of single >>> input >>> >> single output (SISO) Transformer/Estimator. Since this does not >>> change the >>> >> core semantic of Transformer/Estimator, it sounds an intuitive >>> extension to >>> >> me. >>> >> >>> >> 2. Another semantic related case brought up was heterogeneous >>> topologies >>> >> in training and inference. In that case, the input of an Estimator >>> would be >>> >> different from the input of the transformer returned by >>> Estimator.fit(). >>> >> The example to this case is Word2Vec, where the input of the Estimator >>> >> would be an article while the input to the Transformer is a single >>> word. >>> >> The well recognized ML Pipeline doesn't seem to support this case, >>> because >>> >> it assumes the input of the Estimator and corresponding Transformer >>> are the >>> >> same. >>> >> >>> >> Both proposal 1 and proposal 2 leaves this case unsupported in the >>> >> Pipeline. To support this case, >>> >> - Proposal 1 adds support to such cases in the Graph/GraphModel by >>> >> introducing "EstimatorInput" and "TransformerInput". The downside is >>> that >>> >> it complicates the API. >>> >> - Proposal 2 leaves this to users to construct two different DAG >>> for >>> >> training and inference respectively. This means users would have to >>> >> construct the DAG twice even if most parts of the DAG are the same in >>> >> training and inference. >>> >> >>> >> My gut feeling is that this is not a critical difference because such >>> >> heterogeneous topology is sort of a corner case. Most users do not >>> need to >>> >> worry about this. For those who do need this, either proposal 1 and >>> >> proposal 2 seems acceptable to me. That said, it looks that with >>> proposal >>> >> 1, users can still choose to write the program twice without using the >>> >> Graph API, just like what they do in proposal 2. So technically >>> speaking, >>> >> proposal 1 is more flexible and allows users to choose either flavor. >>> On >>> >> the other hand, one could argue that proposal 1 may confuse users with >>> >> these two flavors. Although personally I feel it is clear to me, I am >>> open >>> >> to other ideas. >>> >> >>> >> 3. Lastly, there was a concern about proposal 1 is that some >>> Estimators >>> >> can no longer be added to the Pipeline while the original Pipeline >>> accepts >>> >> any Estimator. >>> >> >>> >> It seems that users have to always make sure the input schema >>> required by >>> >> the Estimator matches the input table. So even for the existing >>> Pipeline, >>> >> people cannot naively add any Estimator into a pipeline. Admittedly, >>> >> proposal 1 added some more requirements, namely 1) the number of >>> inputs >>> >> needs to match the number of outputs of the previous stage, and 2) the >>> >> Estimator does not generate a transformer with different required >>> input >>> >> schema (the heterogeneous case mentioned above). However, given that >>> these >>> >> mismatches will result in exceptions at compile time, just like users >>> put >>> >> an Estimator with mismatched input schema, personally I find it does >>> not >>> >> change the user experience much. >>> >> >>> >> >>> >> So to summarize my thoughts on this fundamental difference. >>> >> - In general, I personally prefer having one set of API. >>> >> - The current proposal 1 may need some improvements in some >>> cases, by >>> >> borrowing something from proposal 2. >>> >> >>> >> >>> >> >>> >> A few other differences that I consider as non-fundamental: >>> >> >>> >> * Do we need a top level encapsulation API for an Algorithm? * >>> >> >>> >> Proposal 1 has a concept of Graph which encapsulates the entire >>> algorithm >>> >> to provide a unified API following the same semantic of >>> >> Estimator/Transformer. Users can choose not to package everything >>> into a >>> >> Graph, but just write their own program and wrap it as an ordinary >>> function. >>> >> >>> >> Proposal 2 does not have the top level API such as Graph. Instead, >>> users >>> >> can choose to write an arbitrary function if they want to. >>> >> >>> >> From what I understand, in proposal 1, users may still choose to >>> ignore >>> >> Graph API and simply construct a DAG by themselves by calling >>> transform() >>> >> and fit(), or calling AlgoOp.process() if we add "AlgoOp" to proposal >>> 1 as >>> >> I suggested earlier. So Graph is just an additional way to construct a >>> >> graph - people can use Graph in a similar way as they do to the >>> >> Pipeline/Pipeline model. In another word, there is no conflict between >>> >> proposal 1 and proposal 2. >>> >> >>> >> >>> >> * The ways to describe a Graph? * >>> >> >>> >> Proposal 1 gives two ways to construct a DAG. >>> >> 1. the raw API using Estimator/Transformer(potentially "AlgoOp" as >>> well). >>> >> 2. using the GraphBuilder API. >>> >> >>> >> Proposal 2 only gives the raw API of AlgoOpertor. It assumes there is >>> a >>> >> main output and some other side outputs, so it can call >>> >> algoOp1.linkFrom(algoOp2) without specifying the index of the output, >>> at >>> >> the cost of wrapping all the Tables into an AlgoOperator. >>> >> >>> >> The usability argument was mostly around the raw APIs. I don't think >>> the >>> >> two APIs differ too much from each other. With the same assumption, >>> >> proposal 1 and proposal 2 can probably achieve very similar levels of >>> >> usability when describing a Graph, if not exactly the same. >>> >> >>> >> >>> >> There are some more other differences/arguments mentioned between the >>> two >>> >> proposals. However, I don't think they are fundamental. And just like >>> the >>> >> cases mentioned above, the two proposals can easily learn from each >>> other. >>> >> >>> >> Thanks, >>> >> >>> >> Jiangjie (Becket) Qin >>> >> >>> >> On Thu, Jul 1, 2021 at 7:29 PM Dong Lin <lindon...@gmail.com> wrote: >>> >> >>> >>> Hi all, >>> >>> >>> >>> Zhipeng, Fan (cc'ed) and I are opening this thread to discuss two >>> >>> different >>> >>> designs to extend Flink ML API to support more use-cases, e.g. >>> >>> expressing a >>> >>> DAG of preprocessing and training logics. These two designs have been >>> >>> documented in FLIP-173 >>> >>> < >>> >>> >>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=184615783 >>> >>> > >>> >>> 。 >>> >>> >>> >>> We have different opinions on the usability and the >>> ease-of-understanding >>> >>> of the proposed APIs. It will be really useful to have comments of >>> those >>> >>> designs from the open source community and to learn your preferences. >>> >>> >>> >>> To facilitate the discussion, we have summarized our design >>> principles >>> >>> and >>> >>> opinions in this Google doc >>> >>> < >>> >>> >>> https://docs.google.com/document/d/1L3aI9LjkcUPoM52liEY6uFktMnFMNFQ6kXAjnz_11do >>> >>> >. >>> >>> Code snippets for a few example use-cases are also provided in this >>> doc >>> >>> to >>> >>> demonstrate the difference between these two solutions. >>> >>> >>> >>> This Flink ML API is super important to the future of Flink ML >>> library. >>> >>> Please feel free to reply to this email thread or comment in the >>> Google >>> >>> doc >>> >>> directly. >>> >>> >>> >>> Thank you! >>> >>> Dong, Zhipeng, Fan >>> >>> >>> >> >>> >> > > -- > best, > Zhipeng > >