Hi everyone, I'm reviving this really old discussion thread, but I just stumbled across Gelly again and realized that this discussion never was finished.
I'll open up a vote thread for dropping the current DataSet based Gelly library. Best regards, Martijn On 2022/01/05 03:37:18 Yun Gao wrote: > Hi, > > Very thanks for initiating the discussion! > > Also +1 to drop the current DataSet based Gelly library so that we could > finally drop the > legacy DataSet API. > > For whether to keep the graph computing ability, from my side graph query / > graph computing and > chaining them with the preprocessing pipeline should be an actually existent > requirements. > Currently we also already have the basis for a graph computing library on > DataStream API > with the new iteration library[1], thus it would be already feasible to have > a stream / batch > unified graph computing library on top of the DataStream API. And it would > indeed be most suitable as > a separate ecosystem project. > > Best, > Yun > > [1] https://cwiki.apache.org/confluence/x/hAEBCw > > > ------------------Original Mail ------------------ > Sender:Martijn Visser <mart...@ververica.com> > Send Date:Wed Jan 5 02:58:53 2022 > Recipients:Zhipeng Zhang <zhangzhipe...@gmail.com> > CC:David Anderson <dander...@apache.org>, Till Rohrmann > <trohrm...@apache.org>, dev <dev@flink.apache.org>, User > <u...@flink.apache.org> > Subject:Re: [DISCUSS] Drop Gelly > > Hi Zhipeng, > > I think that we're seeing more code being externalised, for example with the > Flink Remote Shuffle service [1] and the ongoing discussion on the external > connector repository [2], it makes sense to go for your second option. Maybe > it fits under Flink Extended [3]. > > The main question becomes who can contribute and maintain this library. > Another (intermediate) solution might also be to find someone who can > migrate/move the current Gelly codebase to use Flink's DataStream API in > batch mode, so it wouldn't be using the DataSet API anymore. This has > recently also happened with the State Processor API [4]. > > Best regards, > > Martijn > > [1] https://github.com/flink-extended/flink-remote-shuffle > [2] https://lists.apache.org/thread/bywh947r2f5hfocxq598zhyh06zhksrm > [3] https://github.com/flink-extended/ > [4] https://issues.apache.org/jira/browse/FLINK-24912 > On Tue, 4 Jan 2022 at 14:01, Zhipeng Zhang <zhangzhipe...@gmail.com> wrote: > > Hi Martijin, > > Thanks for the feedback. I am not proposing to bundle the new graph library > with Alink. I am +1 for dropping the DataSet-based Gelly library, but we > probably need a new graph library in Flink for the possible migration. > > We haven't decided what to do yet and probably need more discussion. There > are some possible solutions: > 1. We include a new DataStream-based graph library in FlinkML[1], given that > graphs and machine learning algorithms are more often used together > [2][3][4]. To achieve this, we could reuse the `AlgoOperator` interface in > FlinkML. > 2. We include a new DataStream-based graph library as a separate module/repo. > This is consistent with existing libraries like Spark [5]. > > What do you think? > > > [1] https://github.com/apache/flink-ml > [2] https://arxiv.org/abs/1403.6652 > [3] https://arxiv.org/abs/1503.03578 > [4] https://github.com/apache/spark > > Best, > Zhipeng > Martijn Visser <mart...@ververica.com> 于2022年1月4日周二 15:27写道: > > Hi Zhipeng, > > Good that you've reached out, I wasn't aware that Gelly is being used in > Alink. Are you proposing to write a new graph library as a successor of Gelly > and bundle that with Alink? > > Best regards, > > Martijn > On Tue, 4 Jan 2022 at 02:57, Zhipeng Zhang <zhangzhipe...@gmail.com> wrote: > > Hi everyone, > > Thanks for starting the discussion :) > > We (Alink team [1]) are actually using part of the Gelly library to support > graph algorithms (connected component, single source shortest path, etc.) for > users in Alibaba Inc. > > As DataSet API is going to be dropped, shall we also provide a new graph > library based on DataStream runtime (similar as we did for machine learning)? > > [1] https://github.com/Alibaba/alink > David Anderson <dander...@apache.org> 于2022年1月4日周二 00:01写道: > > Most of the inquiries I've had about Gelly in recent memory have been from > folks looking for a streaming solution, and it's only been a handful. > > +1 for dropping Gelly > > David > On Mon, Jan 3, 2022 at 2:41 PM Till Rohrmann <trohrm...@apache.org> wrote: > > I haven't seen any changes or requests to/for Gelly in ages. Hence, I would > assume that it is not really used and can be removed. > > +1 for dropping Gelly. > > Cheers, > Till > On Mon, Jan 3, 2022 at 2:20 PM Martijn Visser <mart...@ververica.com> wrote: > > Hi everyone, > > Flink is bundled with Gelly, a Graph API library [1]. This has been marked as > approaching end-of-life for quite some time [2]. > > Gelly is built on top of Flink's DataSet API, which is deprecated and slowly > being phased out [3]. It only works on batch jobs. Based on the activity in > the Dev and User mailing lists, I don't see a lot of questions popping up > regarding the usage of Gelly. Removing Gelly would reduce CI time and > resources because we won't need to run tests for this anymore. > > I'm cross-posting this to the User mailing list to see if there are any users > of Gelly at the moment. > > Let me know your thoughts. > > Martijn Visser | Product Manager > mart...@ververica.com > > [1] > https://nightlies.apache.org/flink/flink-docs-stable/docs/libs/gelly/overview/ > [2] https://flink.apache.org/roadmap.html > [3] https://lists.apache.org/thread/b2y3xx3thbcbtzdphoct5wvzwogs9sqz > > > Follow us @VervericaData > -- > Join Flink Forward - The Apache Flink Conference > Stream Processing | Event Driven | Real Time > > > > -- > best, > Zhipeng > > > > -- > best, > Zhipeng > >