Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-08-18 Thread Shuiqiang Chen
Hi Robert, Thank you for your reminding! I have added the wiki page[1] for this FLIP. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs Robert Metzger 于2019年8月14日周三 下午5:56写道: > It seems that this FLIP doesn't have a Wiki page yet [1], even though it is

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-08-14 Thread Robert Metzger
It seems that this FLIP doesn't have a Wiki page yet [1], even though it is already partially implemented [2] We should try to stick more to the FLIP process to manage the project more efficiently. [1] https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals [2] https://issue

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-06-17 Thread Gen Luo
Hi all, In the review of PR for FLINK-12473, there were a few comments regarding pipeline exportation. We would like to start a follow up discussions to address some related comments. Currently, FLIP-39 proposal gives a way for users to persist a pipeline in JSON format. But it does not specify h

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-06-04 Thread Shaoxuan Wang
Stavros, They have the similar logic concept, but the implementation details are quite different. It is hard to migrate the interface with different implementations. The built-in algorithms are useful legacy that we will consider migrate to the new API (but still with different implementations). BT

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-06-03 Thread Stavros Kontopoulos
Hi, Some portion of the code could be migrated to the new Table API no? I am saying that because the new API design is based on scikit-learn and the old one was also inspired by it. Best, Stavros On Wed, May 22, 2019 at 1:24 PM Shaoxuan Wang wrote: > Another consensus (from the offline discussi

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-22 Thread Shaoxuan Wang
Another consensus (from the offline discussion) is that we will delete/deprecate flink-libraries/flink-ml. I have started a survey and discussion [1] in dev/user-ml to collect the feedback. Depending on the replies, we will decide if we shall delete it in Flink1.9 or deprecate&delete in the next re

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-21 Thread Gen Luo
Yes, this is our conclusion. I'd like to add only one point that registering user defined aggregator is also needed which is currently provided by 'bridge' and finally will be merged into Table API. It's same with collect(). I will add a TableEnvironment argument in Estimator.fit() and Transformer

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-21 Thread Aljoscha Krettek
We discussed this in private and came to the conclusion that we should (for now) have the dependency on flink-table-api-xxx-bridge because we need access to the collect() method, which is not yet available in the Table API. Once that is available the code can be refactored but for now we want to

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-17 Thread Gen Luo
Thanks for your reply. For the first question, it's not strictly necessary. But I perfer not to have a TableEnvironment argument in Estimator.fit() or Transformer.transform(), which is not part of machine learning concept, and may make our API not as clean and pretty as other systems do. I would l

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-17 Thread Aljoscha Krettek
Hi, Why is it necessary to acquire a TableEnvironment from a Table? I think you even said yourself what we should do: "I believe it's better to make the api clean and hide the detail of implementation as much as possible.”. In my opinion this means we can only depend on the generic Table API mo

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-17 Thread Gen Luo
It's better not to depend on flink-table-planner indeed. It's currently needed for 3 points: registering udagg, judging the tableEnv batch or streaming, converting table to dataSet to collect data. Most of these requirements can be fulfilled by flink-table-api-java-bridge and flink-table-api-scala-

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-16 Thread Aljoscha Krettek
Hi, I had a look at the document mostly from a module structure/dependency structure perspective. We should make the expected dependency structure explicit in the document. From the discussion in the doc it seems that the intention is that flink-ml-lib should depend on flink-table-planner (the

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-10 Thread Shaoxuan Wang
Hi everyone, I created umbrella Jira FLINK-12470 for FLIP39 and added an "implementation plan" section in the google doc (https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo/edit#heading=h.pggjwvwg8mrx)

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-06 Thread Rong Rong
Thanks for following up promptly and sharing the feedback @shaoxuan. Yes I share the same view with you on the convergence of these 2 FLIPs eventually. I also have some questions regarding the API as well as the possible convergence challenges (especially current Co-processor approach vs. FLIP-39'

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-06 Thread Shaoxuan Wang
Thanks for the feedback, Rong and Flavio. @Rong Rong > There's another thread regarding a close to merge FLIP-23 implementation > [1]. I agree this might still be early stage to talk about productionizing > and model-serving. But I would be nice to keep the design/implementation in > mind that: ea

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-02 Thread Flavio Pompermaier
Hi to all, I have read many discussion about Flink ML and none of them take into account the ongoing efforts carried out of by the Streamline H2020 project [1] on this topic. Have you tried to ping them? I think that both projects could benefits from a joined effort on this side.. [1] https://h2020

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-01 Thread Rong Rong
Hi Shaoxuan/Weihua, Thanks for the proposal and driving the effort. I also replied to the original discussion thread, and still a +1 on moving towards the ski-learn model. I just left a few comments on the API details and some general questions. Please kindly take a look. There's another thread r

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-04-30 Thread Shaoxuan Wang
Thanks for all the feedback. @Jincheng Sun > I recommend It's better to add a detailed implementation plan to FLIP and google doc. Yes, I will add a subsection for implementation plan. @Chen Qin >Just share some of insights from operating SparkML side at scale >- map reduce may not best way to it

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-04-28 Thread Chen Qin
Just share some of insights from operating SparkML side at scale - map reduce may not best way to iterative sync partitioned workers. - native hardware accelerations is key to adopt rapid changes in ML improvements in foreseeable future. Chen On Apr 29, 2019, at 11:02, jincheng sun wrote: > >

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-04-28 Thread jincheng sun
Hi Shaoxuan, Thanks for doing more efforts for the enhances of the scalability and the ease of use of Flink ML and make it one step further. Thank you for sharing a lot of context information. big +1 for this proposal! Here only one suggestion, that is, It has been a short time until the release

[DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-04-28 Thread Shaoxuan Wang
Hi everyone, Weihua has proposed to rebuild Flink ML pipeline on top of TableAPI several months ago in this mail thread: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html Luogen, Becket, Xu, Weihua and I have been working on this