Hi Martin, good to hear that you like Flink :-) AFAIK, there are no plans to add a containment join. The Flink community is currently working on adding support for outer joins. Regarding a containment join, I am not sure about the number of use cases. I would rather try to implement it on top of Flink's batch API instead of adding it as an internal feature/operator to the system because this would touch a lot of things (API, optimizer, operator implementation).
There might be better ways to implement a containment join than using a cross and a filter. - Do you know a distributed algorithm for containment joins? Maybe it can be implemented with Flink's API. - I guess, you are implementing a generic graph framework, but can you make certain assumptions about the data such as relative sizes of the inputs or avg/max size of the lists, etc.? Contributions to Gelly (and Flink in general) are highly welcome. Best, Fabian 2015-07-16 9:39 GMT+02:00 Martin Junghanns <martin.jungha...@gmx.net>: > Hi everyone, > > at first, thanks for building this great framework! We are using Flink > and especially Gelly for building a graph analytics stack (gradoop.com). > > I was wondering if there is a [planned] support for a containment join > operator. Consider the following example: > > DataSet<List<Int>> left := {[0, 1], [2, 3, 4], [5]} > DataSet<Tuple2<Int, Int>> right := {<0, 1>, <1, 0>, <2, 1>, <5, 2>} > > What I want to compute is > > left.join(right).where(list).contains(tuple.f0) := > > { > <[0, 1], <0,1>>, <[0, 1], <1, 0>>, > <[2, 3, 4], <2, 1>>, > <[5], <5, 2> > } > > At the moment, I am solving that using cross and filter, which can be > expensive. > > The generalization of that operator would be "set containment join", > where you join if the right set is contained in the left set. > > If there is a general need for that operator, I would also like to > contribute to its implementation. > > But maybe, there is already another nice solution which I didn't > discover yet? > > Any help would be appreciated. Especially since I would also like to > contribute some of our graph operators (e.g., graph summarization) back > to Flink/Gelly (current WIP state can be found here: [1]). > > Thanks, > > Martin > > > [1] > > https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/main/java/org/gradoop/model/impl/operators/Summarization.java > > >