Hi Martin, I'm really glad to see that you've started using Gelly :)
I think that a graph summarization library method would be a great addition! Let me know if you need help and if you want to discuss ideas or other methods. Cheers, Vasia. On 17 July 2015 at 12:25, Martin Junghanns <martin.jungha...@gmx.net> wrote: > Hi Fabian, hi Stephen, > > thanks for answering my question. Good hint with the list replication, I > will benchmark this vs. cross + filter. > > Best, Martin > > > Am 17.07.2015 um 11:17 schrieb Stephan Ewen: > > I would rewrite this to replicate the list into tuples: > > "foreach x in list: emit (x, list)" > Then join on fields 0. > > This replicates the lists, but makes the join very efficient. > > On Fri, Jul 17, 2015 at 12:26 AM, Fabian Hueske <fhue...@gmail.com> wrote: > >> Hi Martin, >> >> good to hear that you like Flink :-) >> AFAIK, there are no plans to add a containment join. The Flink community >> is currently working on adding support for outer joins. >> Regarding a containment join, I am not sure about the number of use >> cases. I would rather try to implement it on top of Flink's batch API >> instead of adding it as an internal feature/operator to the system because >> this would touch a lot of things (API, optimizer, operator implementation). >> >> There might be better ways to implement a containment join than using a >> cross and a filter. >> - Do you know a distributed algorithm for containment joins? Maybe it can >> be implemented with Flink's API. >> - I guess, you are implementing a generic graph framework, but can you >> make certain assumptions about the data such as relative sizes of the >> inputs or avg/max size of the lists, etc.? >> >> Contributions to Gelly (and Flink in general) are highly welcome. >> >> Best, Fabian >> >> >> 2015-07-16 9:39 GMT+02:00 Martin Junghanns <martin.jungha...@gmx.net>: >> >>> Hi everyone, >>> >>> at first, thanks for building this great framework! We are using Flink >>> and especially Gelly for building a graph analytics stack (gradoop.com >>> ). >>> >>> I was wondering if there is a [planned] support for a containment join >>> operator. Consider the following example: >>> >>> DataSet<List<Int>> left := {[0, 1], [2, 3, 4], [5]} >>> DataSet<Tuple2<Int, Int>> right := {<0, 1>, <1, 0>, <2, 1>, <5, 2>} >>> >>> What I want to compute is >>> >>> left.join(right).where(list).contains(tuple.f0) := >>> >>> { >>> <[0, 1], <0,1>>, <[0, 1], <1, 0>>, >>> <[2, 3, 4], <2, 1>>, >>> <[5], <5, 2> >>> } >>> >>> At the moment, I am solving that using cross and filter, which can be >>> expensive. >>> >>> The generalization of that operator would be "set containment join", >>> where you join if the right set is contained in the left set. >>> >>> If there is a general need for that operator, I would also like to >>> contribute to its implementation. >>> >>> But maybe, there is already another nice solution which I didn't >>> discover yet? >>> >>> Any help would be appreciated. Especially since I would also like to >>> contribute some of our graph operators (e.g., graph summarization) back >>> to Flink/Gelly (current WIP state can be found here: [1]). >>> >>> Thanks, >>> >>> Martin >>> >>> >>> [1] >>> >>> https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/main/java/org/gradoop/model/impl/operators/Summarization.java >>> >>> >>> >> > >