Re: Fast strategy for intersect

2015-10-28 Thread Fabian Hueske
I would go for the first solution with the join. This gives the engine the highest degree of freedom: - repartition vs. broadcast-forward - sort-merge vs. hash-join Best, Fabian 2015-10-28 18:45 GMT+01:00 Vasiliki Kalavri : > Hi Martin, > > isn't finding the intersection of edges enough in this

Re: Fast strategy for intersect

2015-10-28 Thread Vasiliki Kalavri
Hi Martin, isn't finding the intersection of edges enough in this case? And assuming there are no duplicate edges, I believe a join should do the trick. Cheers, -Vasia. On 28 October 2015 at 13:15, Martin Junghanns wrote: > Hi all! > > While working on FLINK-2905, I was wondering what a good (

Fast strategy for intersect

2015-10-28 Thread Martin Junghanns
Hi all! While working on FLINK-2905, I was wondering what a good (and fast) way to compute the intersect between two data sets (Gelly vertices in my case) with unknown size would be. I came up with three ways to solve this: Consider two sets: DataSet> verticesLeft = this.getVertices(); Dat