Hi All,

I have data frame like this.

Equality expression is not working in 1.5.1 but, works as expected in 1.4.0
What is the difference?

scala> eventDF.printSchema()
root
 |-- id: string (nullable = true)
 |-- event: string (nullable = true)
 |-- entityType: string (nullable = true)
 |-- entityId: string (nullable = true)
 |-- targetEntityType: string (nullable = true)
 |-- targetEntityId: string (nullable = true)
 |-- properties: string (nullable = true)

scala> eventDF.groupBy("entityType").agg(countDistinct("entityId")).show
+----------+------------------------+
|entityType|COUNT(DISTINCT entityId)|
+----------+------------------------+
|   ib_user|                    4751|
|      user|                    2091|
+----------+------------------------+


----- not works ( Bug ? )
scala> eventDF.filter($"entityType" ===
"user").select("entityId").distinct.count
res151: Long = 1219

scala> eventDF.filter(eventDF("entityType") ===
"user").select("entityId").distinct.count
res153: Long = 1219

scala> eventDF.filter($"entityType" equalTo
"user").select("entityId").distinct.count
res149: Long = 1219

----- works as expected
scala> eventDF.map{ e => (e.getAs[String]("entityId"),
e.getAs[String]("entityType")) }.filter(x => x._2 ==
"user").map(_._1).distinct.count
res150: Long = 2091

scala> eventDF.filter($"entityType" in
"user").select("entityId").distinct.count
warning: there were 1 deprecation warning(s); re-run with -deprecation for
details
res155: Long = 2091

scala> eventDF.filter($"entityType" !==
"ib_user").select("entityId").distinct.count
res152: Long = 2091


But, All of above code works in 1.4.0

Thanks.

Reply via email to