I think you want a join of type "left_anti"... See below log

scala> import spark.implicits._
import spark.implicits._

scala> case class Foo (a: String, b: Int)
defined class Foo

scala> case class Bar (a: String, d: Double)
defined class Bar

scala> var fooDs = Seq(Foo("a", 1), Foo("b", 2), Foo("c", 3)).toDS
fooDs: org.apache.spark.sql.Dataset[Foo] = [a: string, b: int]

scala> var barDs = Seq(Bar("b", 2.1), Bar("c", 3.2), Bar("d", 4.3)).toDS
barDs: org.apache.spark.sql.Dataset[Bar] = [a: string, d: double]

scala> fooDs.join(barDs, Seq("a"), "left_anti").collect.foreach(println)
[a,1]


On Thu, Oct 12, 2017 at 1:16 PM, Shashikant Kulkarni <
shashikant.kulka...@gmail.com> wrote:

> Hello,
>
> I have 2 datasets, Dataset<Class1> and other is Dataset<Class2>. I want
> the list of records which are in Dataset<Class1> but not in
> Dataset<Class2>. How can I do this in Apache Spark using Java Connector? I
> am using Apache Spark 2.2.0
>
> Thank you
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to