I would go with an outer join as Stefano suggested. Outer joins can be executed as hash joins which will probably be more efficient than using a sort based groupBy/reduceGroup. Also outer joins are a more intuitive and simpler, IMO.
2016-04-07 12:35 GMT+02:00 Stefano Baghino <stefano.bagh...@radicalbit.io>: > Perhaps an outer join can do the trick as well but I don't know which one > would perform better. > > On Thu, Apr 7, 2016 at 12:05 PM, Lydia Ickler <ickle...@googlemail.com> > wrote: > >> Nevermind! I figured it out with groupby and >> Reducegroup >> >> Von meinem iPhone gesendet >> >> > Am 07.04.2016 um 11:51 schrieb Lydia Ickler <ickle...@googlemail.com>: >> > >> > Hi, >> > >> > If i have 2 DataSets A and B of Type Tuple3<Integer,Integer,Double> how >> would I get a subset of A (based on the fields (0,1)) that does not occur >> in B? >> > Is there maybe an already implemented method? >> > >> > Best regards, >> > Lydia >> > >> > Von meinem iPhone gesendet >> > > > > -- > BR, > Stefano Baghino > > Software Engineer @ Radicalbit >