Hi Spark Users, I am trying to understand Spark’s type system a little better. I see that Spark implicitly casts some types to String and not for others. Here is an example.
df = spark.createDataFrame([({"test": 1.0}, 2.0, 3.0, 4.0)], ("observed", "expected", "obs_wt", "exp_wt")) df.select(df["expected"].like("1.0")).show() df.select(df["observed"].like("1.0")).show() Schema of the dataframe StructType(List(StructField(observed,MapType(StringType,DoubleType,true),true),StructField(expected,DoubleType,true),StructField(obs_wt,DoubleType,true),StructField(exp_wt,DoubleType,true))) The second like executes fine, even though the type of “expected” column is double. The third line fails because Spark does not know how to convert map to strings and test it. pyspark.sql.utils.AnalysisException: cannot resolve '`observed` LIKE '1.0'' due to data type mismatch: argument 1 requires string type, however, '`observed`' is of map<string,double> type.; 'Project [observed#0 LIKE 1.0 AS observed LIKE 1.0#15] +- LogicalRDD [observed#0, expected#1, obs_wt#2, exp_wt#3], false Isn’t this inconsistent? Ideally, shouldn’t it fail for "DoubleType" as well as that is not the "StringType"? What would be the point of types if the conversion happens to Strings regardless? Can anyone from Spark community shed light on the reason for this behaviour? -- Thanks and Regards, Dhiren Amar Navani (+1) 480-434-0661 [image: https://www.linkedin.com/pub/dhiren-navani/41/b36/372] <https://www.linkedin.com/pub/dhiren-navani/41/b36/372>