sorry i have found what's the reasons. for null I can not compare it
directly. I have wrote a note for this.
https://bigcount.xyz/how-spark-handles-null-and-abnormal-values.html
Thanks.
wilson wrote:
do you know why the select results below have not consistent behavior?
-
my dataset has NULL included in the columns.
do you know why the select results below have not consistent behavior?
scala> dfs.select("cand_status").count()
val res37: Long = 881793
scala> dfs.select("cand_status").where($"cand_status" =!= "NULL").count()
val res38: Long = 383717
scala> d