Hi All,
I have data frame like this.
Equality expression is not working in 1.5.1 but, works as expected in 1.4.0
What is the difference?
scala> eventDF.printSchema()
root
|-- id: string (nullable = true)
|-- event: string (nullable = true)
|-- entityType: string (nullable = true)
|-- entityId: string (nullable = true)
|-- targetEntityType: string (nullable = true)
|-- targetEntityId: string (nullable = true)
|-- properties: string (nullable = true)
scala> eventDF.groupBy("entityType").agg(countDistinct("entityId")).show
+----------+------------------------+
|entityType|COUNT(DISTINCT entityId)|
+----------+------------------------+
| ib_user| 4751|
| user| 2091|
+----------+------------------------+
----- not works ( Bug ? )
scala> eventDF.filter($"entityType" ===
"user").select("entityId").distinct.count
res151: Long = 1219
scala> eventDF.filter(eventDF("entityType") ===
"user").select("entityId").distinct.count
res153: Long = 1219
scala> eventDF.filter($"entityType" equalTo
"user").select("entityId").distinct.count
res149: Long = 1219
----- works as expected
scala> eventDF.map{ e => (e.getAs[String]("entityId"),
e.getAs[String]("entityType")) }.filter(x => x._2 ==
"user").map(_._1).distinct.count
res150: Long = 2091
scala> eventDF.filter($"entityType" in
"user").select("entityId").distinct.count
warning: there were 1 deprecation warning(s); re-run with -deprecation for
details
res155: Long = 2091
scala> eventDF.filter($"entityType" !==
"ib_user").select("entityId").distinct.count
res152: Long = 2091
But, All of above code works in 1.4.0
Thanks.