Hi Spark Users,
I am trying to understand Spark’s type system a little better. I see that
Spark implicitly casts some types to String and not for others. Here is an
example.

df = spark.createDataFrame([({"test": 1.0}, 2.0, 3.0, 4.0)],
                               ("observed", "expected", "obs_wt", "exp_wt"))
    df.select(df["expected"].like("1.0")).show()
    df.select(df["observed"].like("1.0")).show()

Schema of the dataframe

StructType(List(StructField(observed,MapType(StringType,DoubleType,true),true),StructField(expected,DoubleType,true),StructField(obs_wt,DoubleType,true),StructField(exp_wt,DoubleType,true)))

The second like executes fine, even though the type of “expected” column is
double.
The third line fails because Spark does not know how to convert map to
strings and test it.

pyspark.sql.utils.AnalysisException: cannot resolve '`observed` LIKE
'1.0'' due to data type mismatch: argument 1 requires string type,
however, '`observed`' is of map<string,double> type.;
'Project [observed#0 LIKE 1.0 AS observed LIKE 1.0#15]
+- LogicalRDD [observed#0, expected#1, obs_wt#2, exp_wt#3], false

Isn’t this inconsistent? Ideally, shouldn’t it fail for "DoubleType" as
well as that is not the "StringType"? What would be the point of types if
the conversion happens to Strings regardless?

Can anyone from Spark community shed light on the reason for this behaviour?


-- 
Thanks and Regards,
Dhiren Amar Navani
(+1) 480-434-0661

[image: https://www.linkedin.com/pub/dhiren-navani/41/b36/372]
<https://www.linkedin.com/pub/dhiren-navani/41/b36/372>

Reply via email to