Re: [DISCUSS] Drop Gelly

Martijn Visser Wed, 12 Oct 2022 13:57:43 -0700

Hi everyone,

I'm reviving this really old discussion thread, but I just stumbled across 
Gelly again and realized that this discussion never was finished.


I'll open up a vote thread for dropping the current DataSet based Gelly 
library. 

Best regards,

Martijn

On 2022/01/05 03:37:18 Yun Gao wrote:
> Hi,
> 
> Very thanks for initiating the discussion!
> 
> Also +1 to drop the current DataSet based Gelly library so that we could 
> finally drop the 
> legacy DataSet API. 
> 
> For whether to keep the graph computing ability, from my side graph query / 
> graph computing and
> chaining them with the preprocessing pipeline should be an actually existent 
> requirements. 
> Currently we also already have the basis for a graph computing library on 
> DataStream API
> with the new iteration library[1], thus it would be already feasible to have 
> a stream / batch
> unified graph computing library on top of the DataStream API. And it would 
> indeed be most suitable as 
> a separate ecosystem project. 
> 
> Best,
> Yun
> 
> [1] https://cwiki.apache.org/confluence/x/hAEBCw
> 
> 
>  ------------------Original Mail ------------------
> Sender:Martijn Visser <mart...@ververica.com>
> Send Date:Wed Jan 5 02:58:53 2022
> Recipients:Zhipeng Zhang <zhangzhipe...@gmail.com>
> CC:David Anderson <dander...@apache.org>, Till Rohrmann 
> <trohrm...@apache.org>, dev <dev@flink.apache.org>, User 
> <u...@flink.apache.org>
> Subject:Re: [DISCUSS] Drop Gelly
> 
> Hi Zhipeng,
> 
> I think that we're seeing more code being externalised, for example with the 
> Flink Remote Shuffle service [1] and the ongoing discussion on the external 
> connector repository [2], it makes sense to go for your second option. Maybe 
> it fits under Flink Extended [3]. 
> 
> The main question becomes who can contribute and maintain this library. 
> Another (intermediate) solution might also be to find someone who can 
> migrate/move the current Gelly codebase to use Flink's DataStream API in 
> batch mode, so it wouldn't be using the DataSet API anymore. This has 
> recently also happened with the State Processor API [4]. 
> 
> Best regards,
> 
> Martijn
> 
> [1] https://github.com/flink-extended/flink-remote-shuffle
> [2] https://lists.apache.org/thread/bywh947r2f5hfocxq598zhyh06zhksrm
> [3] https://github.com/flink-extended/
> [4] https://issues.apache.org/jira/browse/FLINK-24912
> On Tue, 4 Jan 2022 at 14:01, Zhipeng Zhang <zhangzhipe...@gmail.com> wrote:
> 
> Hi Martijin,
> 
> Thanks for the feedback. I am not proposing  to bundle the new graph library 
> with Alink. I am +1 for dropping the DataSet-based Gelly library, but we 
> probably need a new graph library in Flink for the possible migration.
> 
> We haven't decided what to do yet and probably need more discussion. There 
> are some possible solutions:
> 1. We include a new DataStream-based graph library in FlinkML[1], given that 
> graphs and machine learning algorithms are more often used together 
> [2][3][4]. To achieve this, we could reuse the `AlgoOperator` interface in 
> FlinkML.
> 2. We include a new DataStream-based graph library as a separate module/repo. 
> This is consistent with existing libraries like Spark [5].
> 
> What do you think?
> 
> 
> [1] https://github.com/apache/flink-ml
> [2] https://arxiv.org/abs/1403.6652
> [3] https://arxiv.org/abs/1503.03578
> [4] https://github.com/apache/spark
> 
> Best,
> Zhipeng
> Martijn Visser <mart...@ververica.com> 于2022年1月4日周二 15:27写道：
> 
> Hi Zhipeng,
> 
> Good that you've reached out, I wasn't aware that Gelly is being used in 
> Alink. Are you proposing to write a new graph library as a successor of Gelly 
> and bundle that with Alink? 
> 
> Best regards,
> 
> Martijn
> On Tue, 4 Jan 2022 at 02:57, Zhipeng Zhang <zhangzhipe...@gmail.com> wrote:
> 
> Hi everyone,
> 
> Thanks for starting the discussion :)
> 
> We (Alink team [1]) are actually using part of the Gelly library to support 
> graph algorithms (connected component, single source shortest path, etc.) for 
> users in Alibaba Inc.
> 
> As DataSet API is going to be dropped, shall we also provide a new graph 
> library based on DataStream runtime (similar as we did for machine learning)?
> 
> [1] https://github.com/Alibaba/alink
> David Anderson <dander...@apache.org> 于2022年1月4日周二 00:01写道：
> 
> Most of the inquiries I've had about Gelly in recent memory have been from 
> folks looking for a streaming solution, and it's only been a handful. 
> 
> +1 for dropping Gelly
> 
> David
> On Mon, Jan 3, 2022 at 2:41 PM Till Rohrmann <trohrm...@apache.org> wrote:
> 
> I haven't seen any changes or requests to/for Gelly in ages. Hence, I would 
> assume that it is not really used and can be removed.
> 
> +1 for dropping Gelly.
> 
> Cheers,
> Till
> On Mon, Jan 3, 2022 at 2:20 PM Martijn Visser <mart...@ververica.com> wrote:
> 
> Hi everyone,
> 
> Flink is bundled with Gelly, a Graph API library [1]. This has been marked as 
> approaching end-of-life for quite some time [2].
> 
> Gelly is built on top of Flink's DataSet API, which is deprecated and slowly 
> being phased out [3]. It only works on batch jobs. Based on the activity in 
> the Dev and User mailing lists, I don't see a lot of questions popping up 
> regarding the usage of Gelly. Removing Gelly would reduce CI time and 
> resources because we won't need to run tests for this anymore. 
> 
> I'm cross-posting this to the User mailing list to see if there are any users 
> of Gelly at the moment. 
> 
> Let me know your thoughts.
> 
> Martijn Visser | Product Manager
> mart...@ververica.com
> 
> [1] 
> https://nightlies.apache.org/flink/flink-docs-stable/docs/libs/gelly/overview/
> [2] https://flink.apache.org/roadmap.html
> [3] https://lists.apache.org/thread/b2y3xx3thbcbtzdphoct5wvzwogs9sqz
> 
> 
> Follow us @VervericaData
> --
> Join Flink Forward - The Apache Flink Conference
> Stream Processing | Event Driven | Real Time
> 
> 
> 
> -- 
> best,
> Zhipeng
> 
> 
> 
> -- 
> best,
> Zhipeng
> 
>

Re: [DISCUSS] Drop Gelly

Reply via email to