Hi All, I have data frame like this.
Equality expression is not working in 1.5.1 but, works as expected in 1.4.0 What is the difference? scala> eventDF.printSchema() root |-- id: string (nullable = true) |-- event: string (nullable = true) |-- entityType: string (nullable = true) |-- entityId: string (nullable = true) |-- targetEntityType: string (nullable = true) |-- targetEntityId: string (nullable = true) |-- properties: string (nullable = true) scala> eventDF.groupBy("entityType").agg(countDistinct("entityId")).show +----------+------------------------+ |entityType|COUNT(DISTINCT entityId)| +----------+------------------------+ | ib_user| 4751| | user| 2091| +----------+------------------------+ ----- not works ( Bug ? ) scala> eventDF.filter($"entityType" === "user").select("entityId").distinct.count res151: Long = 1219 scala> eventDF.filter(eventDF("entityType") === "user").select("entityId").distinct.count res153: Long = 1219 scala> eventDF.filter($"entityType" equalTo "user").select("entityId").distinct.count res149: Long = 1219 ----- works as expected scala> eventDF.map{ e => (e.getAs[String]("entityId"), e.getAs[String]("entityType")) }.filter(x => x._2 == "user").map(_._1).distinct.count res150: Long = 2091 scala> eventDF.filter($"entityType" in "user").select("entityId").distinct.count warning: there were 1 deprecation warning(s); re-run with -deprecation for details res155: Long = 2091 scala> eventDF.filter($"entityType" !== "ib_user").select("entityId").distinct.count res152: Long = 2091 But, All of above code works in 1.4.0 Thanks.