I think you want a join of type "left_anti"... See below log scala> import spark.implicits._ import spark.implicits._
scala> case class Foo (a: String, b: Int) defined class Foo scala> case class Bar (a: String, d: Double) defined class Bar scala> var fooDs = Seq(Foo("a", 1), Foo("b", 2), Foo("c", 3)).toDS fooDs: org.apache.spark.sql.Dataset[Foo] = [a: string, b: int] scala> var barDs = Seq(Bar("b", 2.1), Bar("c", 3.2), Bar("d", 4.3)).toDS barDs: org.apache.spark.sql.Dataset[Bar] = [a: string, d: double] scala> fooDs.join(barDs, Seq("a"), "left_anti").collect.foreach(println) [a,1] On Thu, Oct 12, 2017 at 1:16 PM, Shashikant Kulkarni < shashikant.kulka...@gmail.com> wrote: > Hello, > > I have 2 datasets, Dataset<Class1> and other is Dataset<Class2>. I want > the list of records which are in Dataset<Class1> but not in > Dataset<Class2>. How can I do this in Apache Spark using Java Connector? I > am using Apache Spark 2.2.0 > > Thank you > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >